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PREFACE 


The  author  has  felt  that  applied  courses  in  sampling  should  give  more 
attention  to  elementary  theory  of  expected  values  of  a random  variable. 

The  theory  pertaining  to  a random  variable  and  to  functions  of  random 
variables  is  the  foundation  for  probability  sampling.  Interpretations 
of  the  accuracy  of  estimates  from  probability  sample  surveys  are  predicated 
on,  among  other  things,  the  theory  of  expected  values. 

There  are  many  students  with  career  interests  in  surveys  and  the 
application  of  probability  sampling  who  have  very  limited  backgrounds  in 
mathematics  and  statistics.  Training  in  sampling  should  go  beyond  simply 
learning  about  sample  designs  in  a descriptive  manner.  The  foundations 
in  mathematics  and  probability  should  be  included.  It  can  (1)  add  much 
to  the  breadth  of  understanding  of  bias,  random  sampling  error,  components 
of  error,  and  other  technical  concepts;  (2)  enhance  one’s  ability  to  make 
practical  adaptations  of  sampling  principals  and  correct  use  of  formulas; 
and  (3)  make  communication  with  mathematical  statisticians  easier  and  more 
meaningful. 

This  monograph  is  intended  as  a reference  for  the  convenience  of 
students  in  sampling.  It  attempts  to  express  relevant,  introductory 
mathematics  and  probability  in  the  context  of  sample  surveys.  Although 
some  proofs  are  presented,  the  emphasis  is  more  on  exposition  of  mathe- 
matical language  and  concepts  than. on  the  mathematics  per  se  and  rigorous 
proofs.  Many  problems  are  given  as  exercises  so  a student  may  test  his 
interpretation  or  understanding  of  the  concepts.  Most  of  the  mathematics 
is  elementary.  If  a formula  looks  involved,  it  is  probably  because  it 
represents  a long  sequence  of  arithmetic  operations. 
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Each  chapter  begins  with  very  simple  explanations  and  ends  at  a much 
more  advanced  level.  Most  students  with  only  high  school  algebra  should 
have  no  difficulty  with  the  first  parts  of  each  chapter.  Students  with  a 
few  courses  in  college  mathematics  and  statistics  might  review  the  first 
parts  of  each  chapter  and  spend  considerable  time  studying  the  latter  parts. 
In  fact,  some  students  might  prefer  to  start  with  Chapter  III  and  refer  to 
Chapters  I and  II  only  as  needed. 

Discussion  of  expected  values  of  random  variables,  as  in  Chapter  III, 
was  the  original  purpose  of  this  monograph.  Chapters  I and  II  were  added 
as  background  for  Chapter  III.  Chapter  IV  focuses  attention  on  the  dis- 
tribution of  an  estimate  which  is  the  basis  for  comparing  the  accuracy 
of  alternative  sampling  plans  as  well  as  a basis  for  statements  about  the 
accuracy  of  an  estimate  from  a sample.  The  content  of  Chapter  IV  is 
included  in  books  on  sampling,  but  it  is  important  that  students  hear  or 
read  more  than  one  discussion  of  the  distribution  of  an  estimate,  espe- 
cially with  reference  to  estimates  from  actual  sample  surveys. 

The  author’s  interest  and  experience  in  training  has  been  primarily 
with  persons  who  had  begun  careers  in  agricultural  surveys.  I appreciate 
the  opportunity,  which  the  Statistical  Reporting  Service  has  provided,  to 
prepare  this  monograph. 


Earl  E.  Houseman 
Statistician 
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CHAPTER  I.  NOTATION  AND  SUMMATION 

1.1  INTRODUCTION 

To  work  with  large  amounts  of  data,  an  appropriate  system  of  notation 
is  needed.  The  notation  must  identify  data  by  individual  elements,  and 
provide  meaningful  mathematical  expressions  for  a wide  variety  of  summaries 
from  individual  data.  This  chapter  describes  notation  and  introduces 
summation  algebra,  primarily  with  reference  to  data  from  census  and  sample 
surveys.  The  purpose  is  to  acquaint  students  with  notation  and  summation 
rather  than  to  present  statistical  concepts.  Initially  some  of  the  expres- 
sions might  seem  complex  or  abstract,  but  nothing  more  than  sequences  of 
operations  involving  addition,  subtraction,  multiplication,  and  division 
is  involved.  Exercises  are  included  so  a student  may  test  his  interpreta- 
tion of  different  mathematical  expressions.  Algebraic  manipulations  are 
also  discussed  and  some  algebraic  exercises  are  included.  To  a consider- 
able degree,  this  chapter  could  be  regarded  as  a manual  of  exercises  for 
students  who  are  interested  in  sampling  but  are  not  fully  familiar  with 
the  summation  symbol,  I.  Familiarity  with  the  mathematical  language  will 
make  the  study  of  sampling  much  easier. 

1.2  NOTATION  AND  THE  SYMBOL  FOR  SUMMATION 

"Element"  will  be  used  in  this  monograph  as  a general  expression  for 
a unit  that  a measurement  pertains  to.  An  element  might  be  a farm,  a per- 
son, a school,  a stalk  of  corn,  or  an  animal.  Such  units  are  sometimes 
called  units  of  observation  or  reporting  units.  Generally,  there  are 
several  characteristics  or  items  of  information  about  an  element  that  one 


might  be  interested  in. 


"Measurement"  or  "value"  will  be  used  as  general  terms  for  the 
numerical  value  of  a specified  characteristic  for  an  element.  This 
includes  assigned  values.  For  example,  the  element  might  be  a farm  and 
the  characteristic  could  be  whether  wheat  is  being  grown  or  is  not  being 
grown  on  a farm.  A value  of  "1"  could  be  assigned  to  a farm  growing  wheat 
and  a value  of  "0"  to  a farm  not  growing  wheat.  Thus,  the  "measurement" 
or  "value"  for  a farm  growing  wheat  would  be  "1"  and  for  a farm  not  grow- 
ing wheat  the  value  would  be  "0." 

Typically,  a set  of  measurements  of  N elements  will  be  expressed  as 
follows:  where  X refers  to  the  characteristic  that  is 

measured  and  the  index  (subscript)  to  the  various  elements  of  the  popula- 
tion (or  set).  For  example,  if  there  are  N persons  and  the  characteristic 
X is  a person's  height,  then  X^  is  the  height  of  the  first  person,  etc. 

To  refer  to  any  one  of  elements,  not  a specific  element,  a subscript  "i" 
is  used.  Thus,  X_^  (read  X sub  i)  means  the  value  of  X for  any  one  of  the 
N elements.  A common  expression  would  be  "X^  is  the  value  of  X for  the 
itn  element." 

The  Greek  letter  E (capital  sigma)  is  generally  used  to  indicate  a 
sura.  When  found  in  an  equation,  it  means  "the  sum  of."  For  example, 

N 

E X_^  represents  the  sum  of  all  values  of  X from  X^  to  X^  that  is, 

N 

E X.  = X + X?  + ...+  X^.  The  lower  and  upper  limits  of  the  index  of 
i=l  1 1 1 N 

summation  are  shown  below  and  above  the  summation  sign.  For  example,  to 

20 

specify  the  sum  of  X for  elements  11  thru  20  one  would  write  E X. . 

i=ir 
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You  might  also  see  notation  such  as  "EXi  where  i = 1,  2,...,  N"  which 
indicates  there  are  N elements  (or  values)  in  the  set  indexed  by  serial 
numbers  1 thru  N,  or  for  part  of  a set  you  might  see"EX^  where  i = 11, 

12,...,  20."  Generally  the  index  of  summation  starts  with  1;  so  you  will 

N 

often  see  a summation  written  as  EX_^.  That  is,  only  the  upper  limit  of 

i 

the  summation  is  shown  and  it  is  understood  that  the  summation  begins  with 
i-1.  Alternatively,  when  the  set  of  values  being  summed  is  clearly  under- 
stood, the  lower  and  upper  limits  might  not  be  shown.  Thus,  it  is  under- 
stood that  EX.  or  EX.  is  the  sum  of  X over  all  values  of  the  set  under 
i 1 

consideration.  Sometimes  a writer  will  even  drop  the  subscript  and  use 
EX  for  the  sum  of  all  values  of  X.  Usually  the  simplest  notation  that  is 
adequate  for  the  purpose  is  adopted.  In  this  monograph,  there  will  be 
some  deliberate  variation  in  notation  to  familiarize  students  with  various 
representations  of  data. 

An  average  is  usually  indicated  by  a "bar"  over  the  symbol.  For 
example,  X (read  "X  bar,"  or  sometimes  "bar  X")  means  the  average  value  of 
N 

i=iXi 

X.  Thus,  X = — - — . In  this  case, showing  the  upper  limit,  N,  of  the  sum- 
mation makes  it  clear  that  the  sum  is  being  divided  by  the  number  of  elements 

EX. 

and  X is  the  average  of  all  elements.  However,  — ^ would  also  be  inter- 

preted as  the  average  of  all  values  of  X unless  there  is  an  indication  to 
the  contrary. 

Do  not  try  to  study  mathematics  without  pencil  and  paper . Whenever 
the  shorthand  is  not  clear,  try  writing  it  out  in  long  form.  This  will 
often  reduce  any  ambiguity  and  save  time. 
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(1) 

(2) 

(3) 

(4) 


(5) 

(6) 

(7) 


(8) 

(9) 

(10) 


(11) 


(12) 


Here  are  some  examples  of  mathematical  shorthand 

N 


Sum, of  the  reciprocals  of  X 


Z I = I + 1 + +1 

i=lXi  X1  x2  


Sum  of  the  differences  between 
and  a constant,  C 


Z^(Xi-C)«(X1-C)+(X2-C)+. . .+(XN-C) 


Sum  of  the  deviations  of  X, 
from  the  average  of  X 


z (xi-x)=(x1-x)+(x2-x)+. . .+(xN-x) 


Sum  of  the  absolute  values  df 
the  differences  between  X^ 
and  X.  (Absolute  value, 
indicated  by  the  vertical 
lines,  means  the  positive 
value  of  the  difference) 

Sum  of  the  squares  of  X^ 


Sum  of  squares  of  the_ 
deviations  of  X from  X 


Average  of  the  squares  of  the 
deviations  of  X from  X 


Sum  of  products  of  X and  Y 


z I x.-x|  =1  xrx|  +1  x2-x|  +. . .+|  Xjj-Xl 


2 2 2 2 2 
2xi  -x1  + x2  + x3  +...  Xj, 

£(X.-X)2  - (X^X)2  +...+  (Xj^-X)2 


2 (xrx)2  (x1-x)2+...+(xn-x)2 


i=l 


N 


N 


”iYi  = WW*  • -+XNYN 


Sum  of  quotients  of  X 
divided  by  Y 


Sum  of  X divided  by  the 
sum  of  Y 


x1+x2+...+  Xn 
W . .+  yn 


N 

Sum  of  the  first  N digits  Z i = 1+2+3+. . .+  N 

i-1 


N 

Z iX±  = X1+2X2+3X3+. . .+  NX^ 


t (-l)iX1  = -X1+X2-X3+X4-X5+X6 


(13) 
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Exercise  1.1.  You  are  given  a set  of  four  elements  having  the 
following  values  of  X:  = 2,  = 0,  = 5,  = 7.  To  test  your 

understanding  of  the  summation  notation,  compute  the  values  of  the  follow- 


ing  algebraic 

expressions : 

Expression 

Answer 

(1) 

4 

2 (X.+4) 
i=l 

30 

(2) 

Z2(xi-i> 

20 

(3) 

2I(X1-l) 

20 

(4) 

E2Xi-l 

27 

(5) 

- rxi 

X “ N 

3.5 

(6) 

EXi 

78 

(7) 

2(-X1)2 

78 

(8) 

t^]2 

196 

(9) 

2(X2  - XA) 

64 

(10) 

Z(X2)  - ZX± 

64 

(11) 

zi(xi) 

45 

(12) 

Z(-1)1(X  ) 

0 

(13) 

4 2 
2 (xf  - 3) 

i=l 

66 

(14) 

4 2 4 

IX  - I (3) 
i-1  i=l 

66 

4 

Note:  I (3)  means 

i=l 

find  the  sum  of 
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Expression  (Continued) 
(15)  I(X.  - X) 

i(x.  - x)2 

(16>  — ^ 

Z[X?  - 2X.X  + x2] 
(17)  


(18) 


2 -2 
EXT  - NX 

l 

N-l 


Answer 


0 


29 

3 


29 

3 


29 

3 


Definition  1.1.  The  variance  of  X where  X = X^ , X^,...,  X^ , is 
defined  in  one  of  two  ways: 

» . 2 

Z(X.-X) 

2 i=l  1 

» 

or 


S 


2 


E (X.-X) 2 
i=l  1 
N-l 


The  reason  for  the  two  definitions  will  be  explained  in  Chapter  III. 
The  variance  formulas  provide  measures  of  how  much  the  values  of  X vary 
(deviate)  from  the  average.  The  square  root  of  the  variance  of  X is 
called  the  standard  deviation  of  X.  The  central  role  that  the  above 
definitions  of  variance  and  standard  deviation  play  in  sampling  theory 
will,  become  apparent  as  you  study  sampling.  The  variance  of  an  estimate 
from  a sample  is  one  of  the  measures  needed  to  judge  the  accuracy  of  the 
estimate  and  to  evaluate  alternative  sampling  designs.  Much  of  the  algebra 
and  notation  in  this  chapter  is  related  to  computation  of  variance.  For 
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complex  sampling  plans,  variance  formulas  are  complex.  This  chapter 
should  help  make  the  mathematics  used  in  sampling  more  readable  and  more 
meaningful  when  it  is  encountered. 

Definition  1.2.  "Population"  is  a statistical  term  that  refers  to 
a set  of  elements  from  which  a sample  is  selected  ("Universe"  is  often 
used  instead  of  "Population"). 

Some  examples  of  populations  are  farms,  retail  stores,  students, 
households,  manufacturers,  and  hospitals.  A complete  definition  of  a 
population  is  a detailed  specification  of  the  elements  that  compose  it. 

Data  to  be  collected  also  need  to  be  defined.  Problems  of  defining  popu- 
lations to  be  surveyed  should  receive  much  attention  in'  courses  on  sampling. 
From  a defined  population  a sample  of  elements  is  selected,  information 
for  each  element  in  the  sample  is  collected,  and  inferences  from  the  sam- 
ple are  made  about  the  population.  Nearly  all  populations  for  sample 
surveys  are  finite  so  the  mathematics  and  discussion  in  this  monograph 
are  limited  to  finite  populations. 

In  the  theory  of  sampling,  it  is  important  to  distinguish  between 
data  for  elements  in  a sample  and  data  for  elements  in  the  entire  popula- 
tion. Many  writers  use  uppercase  letters  when  referring  to  the  population 
and  lowercase  letters  when  referring  to  a sample.  Thus  X^,...,  would 
represent  the  values  of  some  characteristic  X for  the  N elements  of  the 

population;  and  x_,...,  x would  represent  the  values  of  X in  a sample  of 
l n 

n elements.  The  subscripts  in  x_,...,  x simply  index  the  different 

1 n 

elements  in  a sample  and  do  not  correspond  to  the  subscripts  in  X^  , . . . , X^^ 
which  index  the  elements  of  the  population.  In  other  words,  x^  could  be 
any  one  of  the  X^s.  Thus, 


N 

5 

1 

N 


= X 


represents  the  population  mean,  and 
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n 

Zixi 

— — = x represents  a sample  mean 

In  this  chapter  we  will  be  using  only  uppercase  letters,  except  for 
constants  and  subscripts,  because  the  major  emphasis  is  on  symbolic  repre- 
sentation of  data  for  a set  of  elements  and  on  algebra.  For  this  purpose, 
it  is  sufficient  to  start  with  data  for  a set  of  elements  and  not  be 
concerned  with  whether  the  data  are  for  a sample  of  elements  or  for  all 
elements  in  a population. 

The  letters  X,  Y,  and  Z are  often  used  to  represent  different  charac- 
teristics (variables)  whereas  the  first  letters  of  the  alphabet  are  commonly 
used  as  constants.  There  are  no  fixed  rules  regarding  notation.  For 
example,  four  different  variables  or  characteristics  might  be  called  X^ , 

X^,  X^,  and  X^.  In  that  case  X^  might  be  used  to  represent  the  i^  value 
of  the  variable  X^.  Typically,  writers  adopt  notation  that  is  convenient 
for  their  problems.  It  is  not  practical  to  completely  standardize  notation. 

Exercise  1.2.  In  the  list  of  expressions  in  Exercise  1.1  find  the 

2 

variance  of  X,  that  is,  find  S . Suppose  that  X^  is  15  instead  of  7.  How 

2 1 

much  is  the  variance  of  X changed?  Answer:  From  9y  to  44-j  . 

Exercise  1.3.  You  are  given  four  elements  having  the  following  values 
of  X and  Y 

X.  = 2 X = 0 X = 5 X.  = 7 

12  3 4 

Yx  = 2 Y2  = 3 y3  = 1 Y4  = 14 
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Find  the  value  of  the  following  expressions : 

Expression  Answer  Expression  Answer 


(1) 

EX.Y. 

l l 

107 

(7) 

XXi-EYi 

-6 

(2) 

(EXi)(EY.) 

280 

(8) 

E(Xi-Y1)2 

74 

(3) 

E(Xi-X) (Y.-Y) 

37 

(9) 

2 2 
r(x£-Yp 

-132 

(4) 

EXiY±-NXY 

37 

(10) 

2 2 
EX7-EYj 
l i 

-132 

(5) 

i x- 

— E — — 
N ^ Y. 

l 

1.625 

(ID 

[E(X1-Yi)]2 

36 

(6) 

E(X.-Y.) 

-6 

(12) 

[xx.]2-[ey.]2 

-204 

1.3  FREQUENCY  DISTRIBUTIONS 

Several  elements  in  a set  of  N might  have  the  same  value  for  some 

characteristic  X.  For  example,  many  people  have  the  same  age.  Let  X.. 

be  a particular  age  and  let  N.  be  the  number  of  people  in  a population 

3 K 


(set)  of  N people  who  have  the  age  X.. 


Then 


E N.  = N where  K is  the 

j=i  J 


number  of  different  ages  found  in  the  population.  Also  EN.X.  is  the  sum 


of  the  ages  of  the  N people  in  the  population  and 


EN.X. 

j j 

EN. 


represents  the 


average  age  of  the  N people.  A listing  of  X^  and  N_.  is  called  the 
frequency  distribution  of  X,  since  N^  is  the  number  of  times  (frequency) 
that  the  age  X^  is  found  in  the  population. 

On  the  other  hand,  one  could  let  X^  represent  the  age  of  the  i 


th 


individual  in  a population  of  N people.  Notice  that  j was  an  index  of  age 
We  are  now  using  i as  an  index  of  individuals,  and  the  average  age  would 


EX  EN  X.  EX 

be  written  as  — — . Note  that  EN.X.  = EX.  and  that  = — — 

N j i l EN . N 

J J J 


. The 
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choice  between  these  two  symbolic  representations  of  the  age  of  people  in 
the  population  is  a matter  of  convenience  and  ourpose. 

Exercise  1.4.  Suppose  there  are  20  elements  in  a set  (that  is , N = 20) 
and  that  the  values  of  X for  the  20  elements  are:  4,  8,  3,  7,  8,  8,  -3,  3, 


7,  2,  8,  4,  8,  8,  3,  7,  8,  10,  3,  8. 

(1)  List  the  values  of  X.  and  N.,  where  i is  an  index  of  the 

3 3 

values  2,  3,  4,  7,  8,  and  10.  This  is  the  frequency 
distribution  of  X. 

(2)  What  is  K equal  to? 

Interpret  and  verify  the  following  by  making  the  calculations  indicated 


N K 

(3)  I X.  = Z N.X. 
i=l  1 j=l  J 3 


(4) 


ZN. 


X 


(5) 


zcx^x)2 

N 


EN. (X.-X)2 

ZN. 

J 


1.4  ALGEBRA 


In  arithmetic  and  elementary  algebra,  the  order  of  the  numbers  when 
addition  or  multiplication  is  performed  does  not  affect  the  results.  The 
familiar  arithmetic  laws  when  extended  to  algebra  involving  the  summation 
symbol  lead  to  the  following  important  rules  or  theorems : 

Rule  1.1  Z (X .-Y ,+Z , ) = ZX.-ZY.+EZ. 

l i i i l l 

or  i(x11+x2i+...+xK.)  - xx11+5:x2i+...+2xKi 

Rule  1.2  EaX^  = aZX^  where  a is  a constant 

Rule  1.3  ZCX^b)  = ZX^+Nb  where  b^  is  constant 
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If  it  is  not  obvious  that  the  above  equations  are  correct,  write  both 
sides  of  each  equation  as  series  and  note  that  the  difference  between  the 
two  sides  is  a matter  of  the  order  in  which  the  summation  (arithmetic)  is 
performed.  Note  that  the  use  of  parentheses  in  Rule  1.3  means  that  b is 
contained  in  the  series  N times.  That  is, 

N 

I (X^b)  = (X1+b)+(X2+b)  + ...  + (XN+b) 

= (X1+X2+...+XN)  + Nb 

On  the  basis  of  Rule  1.1,  we  can  write 

N N N 

I (X  +b)  = I X + I b 

i=l  i=l  i=l 

N 

The  expression  E b means"sum  the  value  of  b, which  occurs  N times."  Therefore, 
i=l 

N 

I b = Nb. 
i=l 

N 

Notice  that  if  the  expression  had  been  I X +b,  then  b is  an  amount  to  add 

i 1 
N 

to  the  sum,  E X . 

i 1 

N _ N 

In  many  equations  X will  appear;  for  example,  E XX  or  E (X  -X)  . 

i i 

Since  X is  constant  with  regard  to  the  summation,  EXX^  * XEX^  . Thus, 

- 

E(X  -X)  = E X -EX  = EX  - NX.  By  definition^  = ~~  . Therefore, 
i 1 i 1 i i 1 N 

NX  = EX  and  E (X  -X)  = 0. 
i 1 i 1 

N 2 

To  work  with  an  expression  like  E(X  +b)  we  must  square  the  quantity 

i i 

in  parentheses  before  summing.  Thus, 
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Z(X  + b)2  = Z(X2  4-  2bX  + b2) 
i 1 1 1 

= EX?  + E2bX.  + Zb2  Rule  1 

l l 

= EX?  + 2bZX.  + Nb2  Rules  2 and  3 

l l 

2 2 

Verify  this  result  by  using  series  notation.  Start  with  (X^+b)  +...+(XN+b)  . 

It  is  very  important  that  the  ordinary  rules  of  algebra  pertaining  to 
the  use  of  parentheses  be  observed.  Students  frequently  make  errors 
because  inadequate  attention  is  given  to  the  placement  of  parentheses  or 


to  the  interpretation  of  parentheses.  Until  you  become  familiar  with  the 
above  rules,  practice  translating  shorthand  to  series  and  series  to  short- 


hand. Study  the  following  examples  carefully: 


(i)  i(x.)2  ax.)2 


(2) 


The  left-hand  side  is  the  sum  of 

the  squares  of  X^.  The  right-hand 

side  is  the  square  of  the  sum  of  X^. 

On  the  right  the  parentheses  are 

necessary.  The  left  side  could 
2 

have  been  written  EX_^  . 

Rule  1.2  applies. 


(3)  Z (X±+Y^) 2 ^ EX2  + ZY2 

(4)  Z(X2  + Y2)  = EX2  + ZY2 

(5)  ZX±Yi  i (ZX.KEY.) 

(6)  Z(Xi-Yi)2  = EX2  - 2EXiYi+ZY2 


A quantity  in  parentheses  must  be 
squared  before  taking  a sum. 

Rule  1.1  applies 

The  left  side  is  the  sum  of  products. 
The  right  side  is  the  product  of 
sums . 


N N 

Za(X.-b)  + aZX.  - ab 
i 1 i i 


(7) 
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N N 

(8)  Ea(X.-b)  = aEX  - Nab 

i i 

N N 

(9)  a[EX  -b]  = aEX  -ab 

i 1 i 1 

(10)  ZXi(Xi-Yi)  = EX*  - ZX1Y1 

Exercise  1.5.  Prove  the  following: 

In  all  cases,  assume  i = 1,  2,...,  N. 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

(9) 


E(X.-X)  * 0 


XiYi  Yi 

E -----  * E— 

X2  Xi 


-2  (EV 

NX  = — - — 


I (aX  +bY.+C)  = aEX  +bEY  +NC 
i=l  1 1 11 


Note:  Equations  (5)  and  (6)  should  be  (or  become) 

very  familiar  equations. 

E(Xi-X)2  = EX2  - NX2 


Z(X±-X)  (Yi-Y)  = EX_.Y.-NXY 


Xi  2 1 

Z(a+  V =i2  s<VaV 


Let  Y^  = a+bX^,  show  that  Y = a+bX 
and  EY2  = Na(a+2bX)  + b2  EX2 

Assume  that  X^  = 1 for  N^  elements  of  a set  and  that  X^  = 0 
for  N*  of  the  elements.  The  total  number  of  elements  in  the 

N1  No 

set  is  N = N.+Nn.  Let  — = P and  — = Q.  Prove  that 
1 U N N 

Z (Xj-X)  2 
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(10)  2r(Xi-d) 2 = E (X^-X)  2 + N(X-d)2.  Hint:  Rewrite  (X^d)  2 

- - 2 

as  [ (X^-X)+(X-d) ] . Recall  from  elementary  algebra  that 

(a+b)2  = a2+2ab+b2  and  think  of  (X^-X)  as  a and  of  (X-d) 

2 

as  b.  For  what  value  of  d is  E(X^-d)  a minimum? 

1.5  DOUBLE  INDEXES  AND  SUMMATION 

When  there  is  more  than  one  characteristic  for  a set  of  elements, 

the  different  characteristics  might  be  distinguished  by  using  a different 

letter  for  each  or  by  an  index.  For  example,  X^  and  might  represent 

the  number  of  acres  of  wheat  planted  and  the  number  of  acres  of  wheat 

harvested  on  the  i^  farm.  Or,  X„  might  be  used  where  i is  the  index 

for  the  characteristics  and  j is  the  index  for  elements;  that  is,  X.. 

ij 

would  be  the  value  of  characteristic  X.  for  the  i*1*1  element.  However, 

l J 

when  data  on  each  of  several  characteristics  for  a set  of  elements  are 
to  be  processed  in  the  same  way,  it  might  not  be  necessary  to  use 
notation  that  distinguishes  the  characteristics.  Thus,  one  might  say 
Z(x.-X)2 

calculate  — j for  all  characteristics. 

More  than  one  index  is  needed  when  the  elements  are  classified  accord- 
ing to  more  than  one  criterion.  For  example,  X might  represent  the  value 
of  characteristic  X for  the  j ^ farm  in  the  i11*1  county;  or  X^^  might  be 
the  value  of  X for  the  household  in  the  block  in  the  i^  city. 

As  another  example,  suppose  the  processing  of  data  for  farms  involves 

classification  of  farms  by  size  and  type.  We  might  let  X_^  represent 

th 

the  value  of  characteristic  X for  the  kL  farm  in  the  subset  of  farms 


classified  as  type  j and  size  i.  If  N„  is  the  number  of  farms  classified 
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as  type  j and  size  i,  then 


N _.  . 
I 
1< 


X 


N.  . 


ijk 

= X. . is  the  average  value  of  X for 

la- 


the subset  of  farms  classified  as  type  j and  size  i. 

There  are  two  general  kinds  of  classification — cross  classification 
and  hierarchal  or  nested  classification.  Both  kinds  are  often  involved 
in  the  same  problem.  However,  we  will  discuss  each  separately.  An 
example  of  nested  classification  is  farms  within  counties,  counties  within 
States,  and  States  within  regions.  Cross  classification  means  that  the 
data  can  be  arranged  in  two  or  more  dimensions  as  illustrated  in  the  next 
section. 

1.5.1  CROSS  CLASSIFICATION 

As  a specific  illustration  of  cross  classification  and  summation  with 
two  indexes,  suppose  we  are  working  with  the  acreages  of  K crops  on  a set 
of  N farms.  Let  X_  represent  the  acreage  of  the  i^  crop  on  the  j ^ farm 
where  i = 1,  2,...,  K and  j =1,  2,...,  N.  In  this  case,  the  data  could 
be  arranged  in  a K by  N matrix  as  follows: 


Row  (i) 

Column  (j) 

Row 

1 

j 

N 

total 

1 

xn 

...  Xlj  ... 

X1N 

Z X.  . 
j 13 

i 

xii 

...  Xij  ... 

XiN 

E xij 

3 

K 

...  X,.  ... 

XKN 

s hi 

3 

Column 

total 

Z X., 
i 11 

z xij 
1 J 

l XiN 

IE  X.  . 

u 13 

16 


N 

The  expression  Z X. . (or  Z X..)  means  the  sum  of  the  values  of  X,.  for  a 

j lj  j 1J 

fixed  value  of  i.  Thus,  with  reference  to  the  matrix,  Z X..  is  the  total 

j 13 

of  the  values  of  X in  the  i^  row;  or,  with  reference  to  the  example  about 

farms  and  crop  acreages,  Z X ' would  be  the  total  acreage  on  all  farms  of 
th  j 1J  K 

whatever  the  i crop  is.  Similarly,  Z X..  (or  Z X..)  is  the  column  total 

i i 

for  the  column,  which  in  the  example  is  the  total  for  the  farm  of 

the  acreages  of  the  K crops  under  consideration.  The  sum  of  all  values  of 

KN 

X could  be  written  as  ZZ  X..  or  ZZ  XJ . . 

ij  13  ij  iJ 

Double  summation  means  the  sum  of  suras.  Breaking  a double  sum  into 
parts  can  be  an  important  aid  to  understanding  it.  Here  are  two  examples: 


KN 


N 


N 


N 


(1)  ^ Xij  ■ ^ Xl3  + Z X2 j +’"+  2 XKj 
ij  j J J J J 


(1.1) 


With  reference  to  the  above  matrix.  Equation  (1.1)  exnresses  the  grand  total 
as  the  sum  of  row  totals. 


KN 


N 


N 


(2)  EE  Xij(Yij+a)  = E X^Y^+a)  +...+  Z X^.+a) 


c 


J 


E Xlj(Ylj+a)  = Xu(Yu+a)  +. . .+  ^(Y^+a) 


(1.2) 


In  Equations  (1.1)  and  (1.2)  the  double  sum  is  written  as  the  sum  of  K 
partial  sums,  that  is,  one  partial  sum  for  each  value  of  i. 

Exercise  1.6.  (a)  Write  an  equation  similar  to  Equation  (1.1)  that 

expresses  the  grand  total  as  the  sum  of  column  totals.  (b)  Involved  in 
Equation  (1.2)  are  KN  terms,  X„  (Y„+a) . Write  these  terms  in  the  form  of 


a matrix. 
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The  rules  given  in  Section  1.4  also  apply  to  double  summation, 


Thus , 


KN 


KN 


KN 


ZZ  X (Y . ,+a)  = ZZ  X..  Y. . + a IZ  X, . 

ij  lj  iJ  u ij  ^ ij 


(1.3) 


Study  Equation  (1.3)  with  reference  to  the  matrix  called  for  in  Exercise 
1.6(b).  To  fully  understand  Equation  (1.3),  you  might  need  to  write  out 
intermediate  steps  for  getting  from  the  left-hand  side  to  the  right-hand 
side  of  the  equation. 

To  simplify  notation,  a system  of  dot  notation  is  commonly  used,  for 
example : 

Z X.  . = X. 

i 1J  i- 

Z X. . = X . 


ZZ  X = X 

ij  3 

The  dot  in  X^  indicates  that  an  index  in  addition  to  i is  involved  and 
X^  is  interpreted  as  the  sura  of  the  values  of  X for  a fixed  value  of  i. 
Similarly,  X ^ is  the  sum  of  X for  any  fixed  value  of  j,  and  X ^ represents 
a sum  over  both  indexes.  As  stated  above,  averages  are  indicated  by  use  of 
a bar.  Thus  X^  is  the  average  of  X^  for  a fixed  value  of  i,  namely 
N 

j-1  ^ - 

= X^  and  would  represent  the  average  of  all  values  of  X^  , 


ZZ  X, 


namely 


ij 


NK 


Here  is  an  example  of  how  the  dot  notation  can  simplify  an  algebraic 
expression.  Suppose  one  wishes  to  refer  to  the  sum  of  the  squares  of  the 


row  totals  in  the  above  matrix.  This  would  be  written  as  Z(X^  ) . The  sum 

i 
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of  squares  of  the  row  means  would  be  E(X^)  . Without  the  dot  notation  the 

2 


K N 2 K 

corresponding  expressions  would  be  E(EX  ) and  E 

i j lj  i 


N 

r±3 

N 


It  is  very 


K N 2 

important  that  the  parentheses  be  used  correctly.  For  example,  E(EX  ) is 

i j J 

KN  2 

not  the  same  as  EEX  . Incidentally,  what  is  the  difference  between  the 
ij  1J 

last  two  expressions? 

Using  the  dot  notation,  the  variance  of  the  row  means  could  be  written 
as  follows: 

K 


E (X. #-X#  ) 
V<V>  ■ — -K-f- ■ 


(1.4) 


where  V stands  for  variance  and  V(X_^  ) is  an  expression  for  the  variance  of 
Xi>  . Without  the  dot  notation,  or  something  equivalent  to  it,  a formula 
for  the  variance  of  the  row  means  would  look  much  more  complicated. 

Exercise  1.7.  Write  an  equation,  like  Equation  (1.4),  for  the  variance 
of  the  column  means. 

Exercise  1.8.  Given  the  following  values  of  X^_. 


i 

j 

1 

! 2 

j 3 

; 4 

1 

8 

11 

9 

14 

2 

10 

13 

11 

14 

3 

12 

15 

10 

17 

19 


Find  the  value  of  the  following  algebraic  expressions: 
Expression  Answer  Expression 


N 

N 

(1) 

j 

42 

(9) 

KZ  (X 

j J 

N 

KN 

(10) 

ZI(xij 

ij 

(2) 

N 

12 

(3) 

X3- 

13.5 

(ID 

KN 

ZZXij- 

ij  3 

(4) 

ZXi4 

45 

KN 

K , 

(5) 

144 

(12) 

ix2 

i 1‘ 
N 

(6) 

X.. 

12 

N 

(7) 

KN 

ZZ(X  -X..)Z 
ij  1J 

78 

(13) 

5“»- 

KN 

(8) 

K - - 2 
NZ(Xi#-X#>) 

i 

18 

(14) 

ZZ(X 
ij  J 

KN 

ZZX 


ij 


KN 


rKN  2 

zzx 


KN 

n2 


Answer 


54 


78 


18 


21 


60 


Illustration  1.1.  To  introduce  another  aspect  of  notation,  refer  to 
the  matrix  on  Page  15  and  suppose  that  the  values  of  X in  row  one  are  to 
be  multiplied  by  a^ , the  values  of  X in  row  two  by  a^ , etc.  The  matrix 


would  then  be 


a1X11  ...  a1Xlj  ...  axX1N 


iXil  aiXij  *•’  aiXiN 


VSa  *•*  *•* 


The  general  term  can  be  written  as  a^X^  because  the  index  of  a and  the 
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index  i in  X.  . are  the  same.  The  total  of  all  KN  values  of  a.XJJt  is 
ij  1 ij 

KN 

lEa^X^  . Since  a^  is  constant  with  respect  to  summation  involving  j , 

N 

we  can  place  a ahead  of  the  summation  svmbol  I . That  is,  IZa  X = 

j ij  13 

Ea.EX..  . 

. 1 . ij 
1 J 


Exercise  1.9. 

Refer  to 

the  matrix  of  values 

Assume  that  a^  = -1 

, a2  = 0, 

and  a^  = 1. 

Calculate 

i 

(1) 

II a.X.  . 
ij  1 1J 

(2) 

iz  aAXij 

1 N 

ij 

(3) 

ZEa.X?. 
. . 1 11 
ij 

Answer :-296 

Show  algebraically 

that : 

(4) 

EEa.X. . * 
..  1 ij 
ij 

’ f 3j"f Ij 

(5) 

a.X.. 

11  1 

..  N 

13 

= Vh. 

(6) 

Zla.X2.  * 
1 13 

■ IX*  -IX* 
j 3J  j lj 

Exercise  1.10. 

Study  the  following  equation 

ij 


the  summations  as  series  to  be  satisfied  that  the  equation  is  correct: 

KN 

EZCaX^ ,+bY . .)  = aZEX. . + bZEY . . 
ij  ij  J ij 

Illustration  1.2.  Suppose 


Yij  = Xij+ai+bj+c  where  1 = 1»  and  j = i»  2 , . . . ,N 


The  values  of  Y can  be  arranged  in  matrix  format  as  follows: 


Yii  - xn  + ai+bi+c 


Yij  * xij  + Wc 


IN 


K1 


\i  + vbi 


+b,+c Y 


KN 


X1N  + WC 


XKN  + WC 


Notice  that  a^  is  a quantity  that  varies  from  row  to  row  but  is  constant 
within  a row  and  that  b^  varies  from  column  to  column  but  is  constant 
within  a column.  Applying  the  rules  regarding  the  summation  symbols  we 
have 

IY  = Z(X  +a  +b.+c) 

3 3 3 3 3 

= ZX. . + Na.  + Eb  + Nc 

j 1J  1 j 


I(X.j+ai+Vc) 


= EXi#  + Eai  + Kb.+Kc 
i 1J  i 1 J 


EZY, . = EE (X. .+a.+b.+c) 

u i3  ij  13  1 J 


EEX.  . + NEa.  + KEb  . + KNc 


ij 


4 j 


ij  i 3 

Illustration  1.3.  We  have  noted  that  E(X^Y^)  does  not  equal 
(EX^)(EY^).  (See  (1)  and  (2)  in  Exercise  1.3,  and  (5)  on  Page  12).  But, 


CX  Y - (EX  )(ZY.)  where  i = 1,  2,...,K  and  j = 1,  2,...,N. 

) 3 i j J 


EEX 

ij 


This  becomes 


clear  if  we  write  the  terms  of  EEX  Y.  in  matrix  format  as  follows: 

ij  3 


Row  Totals 


22 


The  sum  of  the  terms  in  each  row  is  shown  at  the  right.  The  sum  of  these 
row  totals  is  XjZY..  + . . .+  X^Y^  = (X^. . .+  XR)ZYj  = ZX^Y.. . One  could 
get  the  same  final  result  by  adding  the  columns  first.  Very  often  inter- 
mediate summations  are  of  primary  interest. 

Exercise  1.11.  Verify  that  EEX.Y.  = (EX.)(EY.)  using  the  values  of 

ij  1 J 1 3 

X and  Y in  Exercise  1.3.  In  Exercise  1.3  the  subscript  of  X and  the  sub- 
script of  Y we re  the  same  index.  In  the  expression  EEX  Y.  that  is  no  longer 

u J 

the  case. 

Exercise  1.12.  Prove  the  following: 

KN  K?N?  KN 

(1)  ZZ(a.X..+b.)  = Za,  ZXT . + 2Za^  Eb.X..  + KEbf 

..ill  j . i . ii  . i . i ii  . j 

ij  jj  l j J ijJJ  jJ 


KN  _ KN  K 

(2)  ZZa.  (X.  .-X.  ) = Za.  ZXT.  - NZa.XT 

ij  i ij  i-  i 1 j 1J  i 1 1 


KN 


K N 


K 


(3)  ES-l(X  -X  )(Y  -Y..)  = la  IX  Y - Hla  X Y 

lj  ljJJl 

1.5.2  HIERARCHAL  OR  NESTED  CLASSIFICATION 

A double  index  does  not  necessarily  imply  that  a meaningful  cross 

classification  of  the  data  can  be  made.  For  example,  X„  might  represent 

the  value  of  X for  the  farm  in  the  i ^ county.  In  this  case,  j simply 

identifies  a farm  within  a county.  There  is  no  correspondence,  for  example, 

between  farm  number  5 in  one  county  and  farm  number  5 in  another.  In  fact 


the  total  number  of  farms  varies  from  county  to  county.  Suppose  there  are 
K counties  and  N^  farms  in  the  i^  county.  The  total  of  X for  the  i*^ 


N, 


K 


county  could  be  expressed  as  X.  = Z X. . . In  the  present  case  Ex. . is 

1*  i 1J  i J 

KNi 

meaningless.  The  total  of  all  values  of  X is  ZZ  X. . . 

u 13 
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When  the  classification  is  nested,  the  order  of  the  subscripts 
(indexes)  and  the  order  of  the  summation  symbols  from  left  to  right  should 
be  from  the  highest  to  lowest  order  of  classification.  Thus  in  the  above 
example  the  index  for  farms  was  on  the  right  and  the  summation  symbol 


KN. 

involving  this  index  is  also  on  the  right.  In  the  expression  EE  X.., 

ij  1J 

summation  with  respect  to  i cannot  take  place  before  summation  with  regard 
to  j . On  the  other  hand,  when  the  classification  is  cross  classification 
the  summations  can  be  performed  in  either  order. 

In  the  example  of  K counties  and  N_^  farms  in  the  i^  county,  and  in 
similar  examples,  you  may  think  of  the  data  as  being  arranged  in  rows  (or 
columns) : 


Xll*  X12’  ’ X1N 

X21’  X22’  •"  ’ X2N2 


hi-  \2- 


XKN. 


Here  are  two  double  sums  taken  apart  for  inspection: 


KN  „ N N 

1 _ 9 * 1 _ o lV  _ 0 

(1)  EE (X..-X  ) = E (X..-X  ) + ...+  E (X^.-X  ) 
ij  1J  j " j " 


L 


(1.5) 


N 

21(Xlj-X..)2  = (Xj^j^-X..)2  +...+  (X1N  -X..)2 
Equation  (1.5)  is  the  sum  of  squares  of  the  deviations,  (X^-X##),  of  all 


values  of  X. . from  the  overall  mean.  There  are  EN.  values  of  X . . , and 
ij  ± i ij 


24 


KN 

II  X 


= 11 


ij 


K 

IN 


If  there  was  no  interest  in  identifying  the  data  by  counties, 


-2 

a single  index  would  be  sufficient.  Equation  (1.5)  would  then  be  I(X  -X)  . 

i 1 


°i  - 2 N1  - 2 NK  - 2 

(2)  II1(X.j-Xi#)Z  = iV^-X^r  +...+  SK(XKj-XK#)Z 


(1.6) 


c 


_) 


^1  2 
zA(x  -x  r 

j J 


(xirt.)2  +-*-+  (xiNl-xi.)2 


N. 


1 - 2 

With  reference  to  Equation  (1.6)  do  you  recognize  I (X  -X,  ) ? It  involves 

j J 


only  the  subset  of  elements  for  which  i = 1 , namely  X^,  X^****  . Note 

N1  - 1 2 

that  X1  is  the  average  value  of  X in  this  subset.  Hence,  I (X  -X  ) is 

j J 

the  sum  of  the  squares  of  the  deviations  of  the  X's  in  this  subset  from  the 
subset  mean.  The  double  sum  is  the  sum  of  K terms  and  each  of  the  K terms 
is  a sum  of  squares  for  a subset  of  Xfs,  the  index  for  the  subsets  being  i. 

Exercise  1.13.  Let  X^  represent  the  value  of  X for  the  j ^ farm  in 
the  i^  county.  Also,  let  K be  the  number  of  counties  and  N^  be  the  number 
of  farms  in  the  i^  county.  Suppose  the  values  of  X are  as  follows: 


xn  ■ 3 
X21  ’ 4 
X31  “ 0 


X12  " 1 
X22  = 6 
X32  ■ 5 


13 


x33  = 1 


l34 


Find  the  value  of  the  following  expressions 
Expression  Answer 


K 

(1)  ZN, 


9 


25 


Expression  (Continued) 


Answer 


(2) 

KN. 

EE  X 

ij  J 

27 

(3) 

X##  and  X.. 

27 

(4) 

Ni 

fh3  = xi- 

9 

(5) 

X2#  and  X3< 

10 

(6) 

X , X2# , and  X3> 

3 

(7) 

ENA- 

3 

zni 

(8) 

)2  or  IX? 
i j ] i 

245 

(9) 

II  (X  -x..)2 
ij  J 

36 

N1  2 

(10)  ZA(X  -X  y 

j J 


Ni  - 2 

(11)  E1(Xi  -Xi#)Z 

j J 


^i  - 2 

(12)  EZ1(X  -X  y 

ij  1J 


8,  2,  and  14  for  i = 1,  2, 
and  3 respectively 


24 


K 

(13)  EN  (X  -X..) 
i 1 

N 


12 


(14)  E 


KN 
EE^X 

[_ij 


J 


i Ni 


EN 


12 


^ -2  -2 
(15)  ZN.XT  -NX 

i 1 i* 


12 


26 


Expressions  (14)  and  (15)  in  Exercise  1.13  are  symbolic  representations 
of  the  same  thing.  By  definition 


N KN. 

Z a. . = X.  , ZZXX  = X 
ij  i*  . . ij 
J J ij  J 


Substitution  in  (14)  gives 

,2 


K X? 


X 


N 


Also  by  definition 
2 


N±  " Xi-  and  N 


= X 


X 


-2 


K 

and  ZN  = N 
i 1 


(1.7) 


Therefore 


N .X?  and 
i i* 


-2 


.-2 


— = NX  . Hence,  by  substitution,  Equation  (1.7)  becomes  IN^X^-  NX#  # 


Exercise  1.14.  Prove  the  following: 


(1)  EX1X.  X. . = XX? 
ij  11  13  i 1‘ 

KH. 

(2)  EE1*  (X  -X  ) = 0 

ij  J 


- - 2 -2  -2 

(3)  ZN. (X.  -X  ) = ZN . Xj  -NX 

i 1 x'  *•  i 1 i'  " 


Note  that  this  equates  (13)  and  (15)  in  Exercise  1.13. 
The  proof  is  similar  to  the  proof  called  for  in  part  (5) 
of  Exercise  1.5. 

KN.  K N.  „K  K2 

(4)  ZZ1(a.X. .-b.)  = ZaT  ZXxf .-2Za.b .X.  + ZN.bf 

ij  1 1J  1 i 1 j i 1 1 i*  i 1 1 

1.6  THE  SQUARE  OF  A SUM 


In  statistics,  it  is  often  necessary  to  work  algebraically  with  the 
square  of  a sum.  For  example, 

(xxx) 2 = (x1+x2+. . ,+Xjj)  2 = X2+XjX2+.  . .+x2+x2x1+.  . ,+X^+XjjX^.  . . 
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The  terms  in  the  square  of  the  sum  can  be  written  in  matrix  form  as 
follows : 

X1X1  X1X2  XlXj  "•  X1XN 


X2X1  X2X2  •"  X2Xj  •"  X2XN 


XiXl  XiX2 


X.  ...  X 
1 3 


A 


XNXl  v2  •••  Vj  •••  Vn 

The  general  term  in  this  matrix  is  X.X.  where  X.  and  X,  come  from  the  same 

i]  • i j 

set  of  X's,  namely,  X^,...,X^.  Hence,  i and  j are  indexes  of  the  same  set. 

Note  that  the  terms  along  the  main  diagonal  are  the  squares  of  the  value 

2 

of  X and  could  be  written  as  EX_^  . That  is,  on  the  main  diagonal  i = j 
2 

and  X.X.  = X.X.  = X.  . The  remaining  terms  are  all  products  of  one  value 

l j l l l 

of  X with  some  other  value  of  X.  For  these  terms  the  indexes  are  never 

equal.  Therefore,  the  sum  of  all  terms  not  on  the  main  diagonal  can  be 

expressed  as  EX.X.  where  i ^ j is  used  to  express  the  fact  that  the  surama- 

i^j1  3 

tion  includes  all  terms  where  i is  not  equal  to  j,  that  is,  all  terms  other 

2 

than  those  on  the  main  diagonal.  Hence,  we  have  shown  that  (EX^)  = 

EX2  + EX  X.  . 

i*T  J 


Notice  the  symmetry  of  terras  above  and  below  the  main  diagonal: 


X1X2  = X2X1’X1X3 


X^X^  , etc.  When  symmetry  like  this  occurs,  instead  of 

The  sum  of  all 


EX.X.  you  might  see  an  equivalent  expression  2E  X.X.  . 

3 i«j  1 3 

Owing  to  the  symmetry , the  sum 


terms  above  the  main  diagonal  is  E X^X 

i<j  1 j 
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of  the  terms  below  the  main  diagonal  is  the  same.  Therefore,  E X.X.  = 

i*j  1 -1 

2 Z X.X.  . 


i<j 


i J 

Exercise  1.15.  Express  the  terms  of  [ EX.]2  = [X  +X9+X_+X  ]2  in 

i-11  4 

matrix  format.  Let  X^  = 2 , X^  = 0 , X^  = 5 , and  X^  = 7.  Compute  the  values 

of  ZX?  , 2 Z X^X.  , and  [EX.]2  . Show  that  [ZX.]2  = ZX?  + 2 Z X.X.  . 
l . , i l i LiJ  i . . i 1 

1<J  1<J  J 


An  important  result,  which  we  will  use  in  Chapter  3,  follows  from  the 
fact  that 

[SX. ] 2 = EX?  + E X.X.  (1.8) 

1 1 1 3 

Let  Xi  = Y^-Y.  Substituting  (Y^-Y)  for  Xi  in  Equation  1.8  we  have 
[Z (Y  -Y) ] 2 = Z(Y  -Y)2  + Z (Y.-Y)(Y.-Y) 

i i i*j  i J 

We  know  that  [ECY^-Y)]2  = 0 because  ECY^-Y)  = 0.  Therefore, 

Z(Y.-Y)2  + Z (Y  -Y)(Y.-Y)  = 0 
1 i*J  1 J 

It  follows  that  Z (Y . -Y) (Y .-Y)  = -Z(Y.-Y)2  (1.9) 

i*j  1 3 

Exercise  1.16.  Consider 

Z (Y.-Y) (Y .-Y)  = Z (Y.Y.  - YY.  - YY.  + Y2) 
i^j  1 J i*j  1 3 1 3 

= Z Y.Y.  - Y E Y.  - YZY.  + Z Y2 

i^j  1 3 i?4  1 i^j  3 

-2  -2 

Do  you  agree  that  Z Y = N(N-1)Y  ? With  reference  to  the  matrix  layout, 

-2  2 

Y appears  N times  but  the  specification  is  i ^ j so  we  do  not  want  to 

-2 

count  the  N times  that  Y is  on  the  main  diagonal.  Try  finding  the  values 
of  Z X.  and  Z X.  and  then  show  that 

i*J  1 i*J  3 
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E (Y.-Y)(Y.-Y)  « E Y Y.  - N(N-1)Y2 
i=j  1 2 ±*3  2 

Hint:  Refer  to  a matrix  layout.  In  E Y how  many  times  does  Y appear? 

i*J  1 

Does  Y^  appear  the  same  number  of  times? 

1.7  SUMS  OF  SQUARES 

For  various  reasons  statisticians  are  interested  in  components  of 
variation,  that  is,  measuring  the  amount  of  variation  attributable  to  each 
of  more  than  one  source.  This  involves  computing  sums  of  squares  that 
correspond  to  the  different  sources  of  variation  that  are  of  interest. 

We  will  discuss  a simple  example  of  nested  classification  and  a simple 
example  of  cross  classification. 

1.7.1  NESTED  CLASSIFICATION 

To  be  somewhat  specific,  reference  is  made  to  the  example  of  K counties 
and  N^  farms  in  the  it^1  county.  The  sum  of  the  squares  of  the  deviations 
of  X^  and  X#  # can  be  divided  into  two  parts  as  shown  by  the  following 
formula: 

rai  . 2 K _ _ 2 rai  - 2 

EE  (X.  -X  ) = IN. (X,  -X  ) + II  (X. .-X.  ) (1.10) 

lj  ij  •'  i1  **  " ij  « 1< 

The  quantity  on  the  left-hand  side  of  Equation  (1.10)  is  called  the 

total  sum  of  squares.  In  Exercise  1.13,  Part  (9),  the  total  sum  of  squares 

was  36. 

The  first  quantity  on  the  right-hand  side  of  the  equation  involves  the 
squares  of  (X^  -X  # ),  which  are  deviations  of  the  class  means  from  the  over- 
all mean.  It  is  called  the  between  class  sum  of  squares  or  with  reference 
to  the  example  the  between  county  sum  of  squares.  In  Exercise  1.13, 

Part  (13) , the  between  county  sum  of  squares  was  computed.  The  answer  was 


12. 
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The  last  term  is  called  the  within  sum  of  squares  because  it  involves 
deviations  within  the  classes  from  the  class  means.  It  was  presented 
previously.  See  Equation  (1.6)  and  the  discussion  pertaining  to  it.  In 
Exercise  1.13,  the  within  class  sum  of  squares  was  24,  which  was  calculated 
in  Part  (12).  Thus,  from  Exercise  1.13,  we  have  the  total  sum  of  squares, 
36,  which  equals  the  between,  12,  plus  the  within,  24.  This  verifies 
Equation  (1.10). 

The  proof  of  Equation  1.10  is  easy  if  one  gets  started  correctly. 

Write  X^-X#  # = (X^-X^  ) +(X^  -X##).  This  simple  technique  of  adding  and 
subtracting  divides  the  deviation  (X„-X##)  into  two  parts.  The  proof 
proceeds  as  follows: 


KN.  _ 9 

EEX(X  -X..T 
ij  J 


EE [ (X. .-X.  ) + (X.  -X  )] 
ij  13  X*  1- 

EE[(X.j-X.>)2  + 2(X.j-Xi>)(Xi#-X_)  + (Xi#-X..)2] 


= EE(Xi.-Xi>)  + 2EE(Xi>-Xi  KX^-X  #)  + EE(Xi#-X.#) 


ij 


ij 


iJJ 


KN 


Exercise  1.17 


Show  that  EE  (X.  .-X.  )(X. 

ij  1J  1#  1 


) = 0 


KN . _ _ „ K _ 

and  that  EEX(X.  -X#>)  = E N.(X  -X>#) 

ij  1#  i 1 1# 


Completion  of  Exercise  1.17  completes  the  proof. 

Equation  (1.10)  is  written  in  a form  which  displays  its  meaning  rather 
than  in  a form  that  is  most  useful  for  computational  purposes.  For  computa- 
tion purposes,  the  following  relationships  are  commonly  used: 


- 2 

Total  = ZE  (X. .-X  ) 

1 j 1J 


ZEX?  -NX2 

ij  lj  " 
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Between  = ZN±  (X±#-X#  # ) 2 = ZNJC^-NX^ 
i * * i 


-2  2 -2 
Within  = EE  (X  -X  ) = ZZX^.-ZN.X^ 

ij  1J  11  i j iJ  i 1 11 


where  N = ZN±  , X± 
i 


Ni 

lxii 

j i 

N. 

1 


, and  X 


KN 

ZZTC.  . 

iLli 

N 


KN 

i 2 

Notice  that  the  major  part  of  arithmetic  reduces  to  calculating  ZZ  X. . , 


ij 


ij 


K -2  -2 
IN.X^  , and  NX 

1 1 i- 


There  are  variations  of  this  that  one  might  use.  For 
2 


K X~t  K _2 

example,  one  could  use  Z - — instead  of  ZN  X 

i i i 1 


Exercise  1.18.  Show  that 

rai  - 2 2-2 

ZZ  (X  -X  ) = ZZX^.-ZN  XT 

ij  ij  i 1# 

A special  case  that  is  useful  occurs  when  N^  = 2.  The  within  sum  of 


squares  becomes 


K2 


“‘W  - s[Cxii-Xi.)  + (X12-Xi.)  ] 


Since  X, 


Xil+Xi2 


it  is  easy  to  show  that 


(Xil-Xi.)  4 (Xil-Xi2) 

and  (Xi2-Xi.)2  - \ (XirXi2)2 

Therefore  the  within  sum  of  squares  is 

1 K 2 

2 l (XirXi2) 


which  is  a convenient  form  for  computation. 
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1.7.2  CROSS  CLASSIFICATION 

Reference  is  made  to  the  matrix  on  Page  15  and  to  Exercise  1.8.  The 
total  sum  of  squares  can  be  divided  into  three  parts  as  shown  by  the 
following  formula: 

KN  2 K _ 2 N _ 2 KN 

ZE(X. .-X  ) = NE(X.  -X  ) + KI (X  .-X  Y + EE(X  -X.  -X  +X  ) (1.11) 

..  il  **  . i*  . »i  ••  ..  ii  i*  »1 

1J  J 1 3 1 3 

Turn  to  Exercise  1.8  and  find  the  total  sum  of  squares  and  the  three 

parts.  They  are: 


Total 

Rows 

Columns 

Remainder 


Sum  of  Squares 
78 
18 
54 
6 


The  three  parts  add  to  the  total  which  verifies  Equation  (1.11).  In 
Exercise  1.8,  the  sum  of  squares  called  remainder  was  computed  directly 
(see  Part  (10)  of  Exercise  1.8).  In  practice,  the  remainder  sum  of  squares 
is  usually  obtained  by  subtracting  the  row  and  column  sum  of  squares  from 
the  total. 

Again,  the  proof  of  Equation  (1.11)  is  not  difficult  if  one  makes  the 
right  start.  In  this  case  the  deviation,  (X„-X  ),  is  divided  into  three 

parts  by  adding  and  subtracting  X^#  and  X#_.  as  follows: 


(X. .-X  ) = (X.  -X  ) + (X  .-X  ) + (X..-X.  -X  .+X  ) 

13  • * 1 * *3  13  i*  *3  *• 


(1.12) 


Exercise  1.19.  Prove  Equation  (1.11)  by  squaring  both  sides  of  Equa- 
tion (1.12)  and  then  doing  the  summation.  The  proof  is  mostly  a matter  of 
showing  that  the  sums  of  the  terms  which  are  products  (not  squares)  are  zero. 

KN  _ 

For  example,  showing  that  EE(X.  -X  ^ ) (Xi  .-X.  #-X  .+X>#)  = 0 . 

ij  1 * * * 1 '•* 


CHAPTER  II.  RANDOM  VARIABLES  AND  PROBABILITY 


2.1  RANDOM  VARIABLES 

The  word  "random"  has  a wide  variety  of  meanings.  Its  use  in  such 
terms  as  "random  events,"  "random  variable,"  or  "random  sample,"  however, 
implies  a random  process  such  that  the  probability  of  an  event  occurring 
is  known  a priori.  To  select  a random  sample  of  elements  from  a population, 
tables  of  random  numbers  are  used.  There  are  various  ways  of  using  such 
tables  to  make  a random  selection  so  any  given  element  will  have  a specified 
probability  of  being  selected. 

The  theory  of  probability  sampling  is  founded  on  the  concept  of  a 
random  variable  which  is  a variable  that,  by  chance,  might  equal  any  one 
of  a defined  set  of  values.  The  value  of  a random  variable  on  any  partic- 
ular occasion  is  determined  by  a random  process* in  such  a way  that  the 
chance  (probability)  of  its  being  equal  to  any  specified  value  in  the  set 
is  known.  This  is  in  accord  with  the  definition  of  a probability  sample 
which  states  that  every  element  of  the  population  must  have  a known  prob- 
ability (greater  than  zero)  of  being  selected.  A primary  purpose  of  this 
chapter  is  to  present  an  elementary,  minimum  introduction  or  review  of 
probability  as  background  for  the  next  chapter  on  expected  values  of  a 
random  variable.  This  leads  to  a theoretical  basis  for  sampling  and  for 
evaluating  the  accuracy  of  estimates  from  a probability-sample  survey. 

In  sampling  theory,  we  usually  start  with  an  assumed  population  of  N 
elements  and  a measurement  for  each  element  of  some  characteristic  X.  A 
typical  mathematical  representation  of  the  N measurements  or  values  is 
, . . . ,X^ , . . . jXjj  where  X^  is  the  value  of  the  characteristic  X for  the  i^ 
element.  Associated  with  the  itn  element  is  a probability  , which  is  the 
probability  of  obtaining  it  when  one  element  is  selected  at  random  from  the 
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set  of  N.  The  P^'s  will  be  called  selection  probabilities.  If  each 

element  has  an  equal  chance  of  selection,  P.  = 77.  The  P.’s  need  not  be 
n 1 N i 

equal,  but  we  will  specify  that  each  P_^>0.  When  referring  to  the  probability 
of  X being  equal  to  we  will  use  P(X^)  instead  of  P^. 

We  need  to  be  aware  of  a distinction  between  selection  probability 
and  inclusion  probability,  the  latter  being  the  probability  of  an  element 
being  included  in  a sample.  In  this  chapter,  much  of  the  discussion  is 
oriented  toward  selection  probabilities  because  of  its  relevance  to  finding 
expected  values  of  estimates  from  samples  of  various  kinds. 

Definition  2.1.  A random  variable  is  a variable  that  can  equal  any 
value  X^,  in  a defined  set,  with  a probability  P(X^). 

When  an  element  is  selected  at  random  from  a population  and  a measure- 
ment of  a characteristic  of  it  is  made,  the  value  obtained  is  a random 
variable.  As  we  shall  see  later,  if  a sample  of  elements  is  selected  at 
random  from  a population,  the  sample  average  and  other  quantities  calculated 
from  the  sample  are  random  variables. 

Illustration  2,1.  One  of  the  most  familiar  examples  of  a random 
variable  is  the  number  of  dots  that  happen  to  be  on  the  top  side  of  a die 
when  it  comes  to  rest  after  a toss.  This  also  illustrates  the  concept  of 
probability  that  we  are  interested  in;  namely,  the  relative  frequency  with 
which  a particular  outcome  will  occur  in  reference  to  a defined  set  of 
possible  outcomes.  With  a die  there  are  six  possible  outcomes  and  we  expect 
each  to  occur  with  the  same  frequency,  1/6,  assuming  the  die  is  tossed  a 
very  large  or  infinite  number  of  times.  Implicit  in  a statement  that  each 
side  of  a die  has  a probability  of  1/6  of  being  the  top  side  are  some 
assumptions  about  the  physical  structure  of  the  die  and  the  ’'randomness" 


of  the  toss. 
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The  additive  and  multiplicative  laws  of  probability  can  be  stated  in 
several  ways  depending  upon  the  context  in  which  they  are  to  be  used.  In 
sampling,  our  interest  is  primarily  in  the  outcome  of  one  random  selection 
or  of  a series  of  random  selections  that  yields  a probability  sample. 

Hence,  the  rules  or  theorems  for  the  addition  or  multiplication  of  prob- 
abilities will  be  stated  or  discussed  only  in  the  context  of  probability 
sampling. 

2.2  ADDITION  OF  PROBABILITIES 

Assume  a population  of  N elements  and  a variable  X which  has  a value 
for  the  i^  element.  That  is,  we  have  a set  of  values  of  X,  namely 
X^ , . . . ,X^ , . . . jX^.  Let  P^,. . . ,P^,. . . ,P^  be  a set  of  selection  probabilities 
where  P^  is  the  probability  of  selecting  the  i^  element  when  a random 
selection  is  made.  We  specify  that  each  P^  must  be  greater  than  zero  and 

N 

that  IP^,  = 1.  When  an  element  is  selected  at  random,  the  probability  that 
it  is  either  the  i^  element  or  the  i*"*1  element  is  P.  + P..  This  addition 

— J i j 

rule  can  be  stated  more  generally.  Let  Pg  be  the  sum  of  the  selection 

probabilities  for  the  elements  in  a subset  of  the  N elements.  When  a random 
selection  is  made  from  the  whole  set,  Pg  is  the  probability  that  the  element 
selected  is  from  the  subset  and  1-P  is  the  probability  that  it  is  not  from 

the  subset.  With  reference  to  the  variable  X,  let  PCX^)  represent  the 

probability  that  X equals  X^  . Then  P(X^)+P(X_.)  represents  the  probability 
that  X equals  either  X^  or  X^ ; and  Pg(X)  could  be  used  to  represent  the 
probability  that  X is  equal  to  one  of  the  values  in  the  subset. 

Before  adding  (or  subtracting)  probabilities  one  should  determine 
whether  the  events  are  mutually  exclusive  and  whether  all  possible  events 
have  been  accounted  for.  Consider  two  subsets  of  elements,  subset  A and 
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subset  B,  of  a population  of  N elements.  Suppose  one  element  is  selected 
at  random.  What  is  the  probability  that  the  selected  element  is  a member 
of  either  subset  A or  subset  B?  Let  P(A)  be  the  probability  that  the 
selected  element  is  from  subset  A;  that  is,  P(A)  is  the  sum  of  the  selec- 
tion probabilities  for  elements  in  subset  A.  P(B)  is  defined  similarly. 

If  the  two  subsets  are  mutually  exclusive,  which  means  that  no  element  is 
in  both  subsets,  the  probability  that  the  element  selected  is  from  either 
subset  A or  subset  B is  P(A)  + P(B).  If  some  elements  are  in  both  subsets, 
see  Figure  2.1,  then  event  A (which  is  the  selected  element  being  a member 
of  subset  A)  and  event  B (which  is  the  selected  element  being  a member  of 
subset  B)  are  not  mutually  exclusive  events.  Elements  included  in  both 
subsets  are  counted  once  in  P(A)  and  once  in  P(B).  Therefore,  we  must 
subtract  P(A,B)  from  P(A)  + P(B)  where  P(A,B)  is  the  sum  of  the  probabilities 
for  the  elements  that  belong  to  both  subset  A and  subset  B.  Thus, 

P (A  or  B)  = P (A)  + P (B)  - P(A,B) 


Figure  2.1 


To  summarize,  the  additive  law  of  probability  as  used  above  could  be 
stated  as  follows:  If  A and  B are  subsets  of  a set  of  all  possible  outcomes 

that  could  occur  as  a result  of  a random  trial  or  selection,  the  probability 
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that  the  outcome  is  in  subset  A or  in  subset  B is  equal  to  the  probability 
that  the  outcome  is  in  A plus  the  probability  that  it  is  in  B minus  the 
probability  that  it  is  in  both  A and  B. 

The  additive  law  of  probability  extends  without  difficulty  to  three 
or  more  subsets.  Draw  a figure  like  Figure  2.1  with  three  subsets  so  that 
some  points  are  common  to  all  three  subsets..  Observe  that  the  additive 
law  extends  to  three  subsets  as  follows: 


P (A  or  B or  C)=P(A)+P(B)+P(C)-P(A,B)-P(A,C)-P(B ,C)+P(A,B ,C) 


As  a case  for  further  discussion  purposes , assume  a population  of  N 
elements  and  two  criteria  for  classification.  A two-way  classification  of 
the  elements  could  be  displayed  in  the  format  of  Table  2.1. 


Table  2.1 — A two-way  classification  of  N elements 


X class 

Y class 

1 ...  j ...  s 

Total 

1 

NU,P11  •••  VPlj  •••  Nls’Pls 

V’V 

i 

Nil’Pil  "•  Nij  >Pij  •"  Nis>Pis 

Ni.’pi. 

t 

N . ,P  . . . . N , ,P  . . . . N ,P 

N ,P 

tl*  tl  tj  * tj  ts  * ts 

t»  ’ t. 

Total 

N.l  N.j  N.s 

N,P=1 

The  columns  represent  a classification  of  the  elements  in  terms  of  criterion 


X;  the  rows  represent  a classification  in  terms  of  criterion  Y;  is  the 
number  of  elements  in  X class  i and  Y class  i;  and  P.. 


is  the  sum  of  the 
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selection  probabilities  for  the  elements  in  X class  j and  Y class  i.  Any 
one  of  the  N elements  can  be  classified  in  one  and  only  one  of  the  t times 
s cells. 

Suppose  one  element  from  the  population  of  N is  selected.  According 

to  the  additive  law  of  probability  we  can  state  that 

EP . . = P . is  the  probability  that  the  element  selected  is  from 
± ij  *3 

X class  j , and 

EP..  = P.  is  the  probabilitv  that  the  element  selected  is  from 

j 1J 

Y class  i,  where 

P„  is  the  probability  that  the  element  selected  is  from 

(belongs  to  both)  X class  j and  Y class  i. 

The  probabilities  P and  P^  are  called  marginal  probabilities. 

The  probability  that  one  randomly  selected  element  is  from  X class 

i or  from  Y-class  i is  P . + P . - P...  (The  answer  is  not  P . + P.  because 

— *J  i-  ij  * J i* 

in  P . + P.  there  are  N..  elements  in  X class  j and  Y class  i that  are 
* J i*  ij 

counted  twice.) 

N±.  N. . 

If  the  probabilities  of  selection  are  equal,  P..  = — , P . = — , 

' 1 ’ ij  N • j N 

Ni. 

and  P.  = -4-  . 
i*  N 


Illustration  2.2.  Suppose  there  are  5,000  students  in  a university. 
Assume  there  are  1,600  freshmen,  1,400  sophomores,  and  500  students  living 
in  dormitory  A.  From  a list  of  the  5,000  students,  one  student  is  selected 
at  random.  Assuming  each  student  had  an  equal  chance  of  selection,  the 
probability  that  the  selected  student  is  a freshman  is  > that  he  is  a 

sophomore  is  , and  that  he  is  either  a freshman  or  a sophomore  is  + 

. Also,  the  probability  that  the  selected  student  lives  in  dormitory  A 
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is  ,-qq-q  . But,  what  is  the  probability  that  the  selected  student  is  either 
a freshman  £r  lives  in  dormitory  A?  The  question  involves  two  classifica- 
tions: one  pertaining  to  the  student's  class  and  the  other  to  where  the 

student  lives.  The  information  given  about  the  5000  students  could  be 
arranged  as  follows  : 


Class 

Dormitory 

Freshmen  Sophomores  Others 

Total 

A 

500 

Other 

4500 

Total 

1600  1400  2000 

5000 

From  the  above  format,  one  can  readily  observe  that  the  answer  to  the  ques- 
tion depends  upon  how  many  freshmen  live  in  dormitory  A.  If  the  problem 

had  stated  that  200  freshmen  live  in  dormitory  A,  the  answer  would  have 

, 1600  , 500  200 

been  + — . 

5000  5000  5000 

Statements  about  probability  need  to  be  made  and  interpreted  with 
great  care.  For  example,  it  is  not  correct  to  say  that  a student  has  a 
probability  of  0.1  of  living  in  dormitory  A simply  because  500  students  out 
of  5000  live  in  A.  Unless  students  are  assigned  to  dormitories  by  a random 
process  with  known  probabilities  there  is  no  basis  for  stating  a student's 
probability  of  living  in  (being  assigned  to)  dormitory  A.  We  are  consider- 
ing the  outcome  of  a random  selection. 

Exercise  2.1.  Suppose  one  has  the  following  information  about  a 
population  of  1000  farms : 
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600  produce  corn 
500  produce  soybeans 
300  produce  wheat 
100  produce  wheat  and  corn 
200  have  one  or  more  cows 

all  farms  that  have  cows  also  produce  corn 
200  farms  do  not  produce  any  -crops 
One  farm  is  selected  at  random  with  equal  probability  from  the  list 
of  1000.  VThat  is  the  probability  that  the  selected  farm, 

(a)  produces  com?  Answer:  0.6 

(b)  does  not  produce  wheat? 

(c)  produces  com  but  no  wheat?  Answer:  0.5 

(d)  produces  com  or  wheat  but  not  both? 

(e)  has  no  cows?  Answer:  0.8 

(f)  produces  corn  or  soybeans? 

(g)  produces  com  and  has  no  cows?  Answer:  0.4 

(h)  produces  either  corn,  cows,  or  both? 

(i)  does  not  produce  corn  or  wheat? 

One  of  the  above  questions  cannot  be  answered. 

Exercise  2.2.  Assume  a population  of  10  elements  and  selection 
probabilities  as  follows: 


Element 

X. 

l 

P. 

l 

Element 

X. 

i 

P. 

l 

1 

2 

.05 

6 

11 

.15 

2 

7 

.10 

7 

2 

.20 

3 

12 

.08 

8 

8 

.05 

4 

0 

.02 

9 

6 

.05 

5 

8 

.20 

10 

3 

.10 
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One  element  is  selected  at  random  with  probability  P^. 

Find: 

(a)  P(X=2),.  the  probability  that  X ■ 2. 

(b)  P(X>10),  the  probability  that  X is  greater  than  10. 

(c)  P(X<2),  the  probability  that  X is  equal  to  or  less  than  2. 

(d)  P(3<X>10),  the  probability  that  X is  greater  than  3 and  less 
than  10 

(e)  P(X<3  or  X>10) , the  probability  that  X is  either  equal  to  or  less 
than  3 or  is  equal  to  or  greater  than  10. 

Note:  The  answer  to  (d)  and  the  answer  to  (e)  should  add  to  1. 

So  far,  we  have  been  discussing  the  probability  of  an  event  occurring  as 
a result  of  a single  random  selection.  When  more  than  one  random  selection 
occurs  simultaneously  or  in  succession  the  multiplicative  law  of  prob- 
ability is  useful. 

2.3  MULTIPLICATION  OF  PROBABILITIES 

Assume  a population  of  N elements  and  selection  probabilities 

N 

P^ , . . . ,P_£ , . . . ,PN-  Each  Pi  is  greater  than  zero  and  = 1.  Suppose 

two  elements  are  selected  but  before  the  second  selection  is  made  the 

first  element  selected  is  returned  to  the  population.  In  this  case  the 

outcome  of  the  first  selection  does  not  change  the  selection  probabilities 

for  the  second  selection.  The  two  selections  (events)  are  independent. 

th  th 

The  probability  of  selecting  the  l element  first  and  the  j element 
second  is,  P^P^ , the  product  of  the  selection  probabilities  P^  and  P ^ . 

If  a selected  element  is  not  returned  to  the  population  before  the  next 
selection  is  made,  the  selection  probabilities  for  the  next  selection  are 
changed.  The  selections  are  dependent. 


42 


The  multiplicative  law  of  probability,  for  two  independent  events 
A and  B,  states  that  the  joint  probability  of  A and  B happening  in  the 
order  A,B  is  equal  to  the  probability  that  A happens  times  the  prob- 
ability that  B happens.  In  equation  forra,P(AB)  = P(A)P(B).  For  the 
order  B,A,  P(BA)  = P(B)P(A)  and  we  note  that  P(AB)  = P(BA).  Remember, 
independence  means  that  the  probability  of  B happening  is  not  affected 
by  the  occurrence  of  A and  vice  versa.  The  multiplicative  law  extends 
to  any  number  of  independent  events.  Thus,  P(ABC)  = P(A)P(B)P(C) . 

For  two  dependent  events  A and  B,  the  multiplicative  law  states  that 
the  joint  probability  of  A and  B happening  in  the  order  A,B  is  equal  to 
the  probability  of  A happening  times  the  probability  that  B happens  under 
the  condition  that  A has  already  happened.  In  equation  form  P(AB)  = 
P(A)P(b|a);  or  for  the  order  B,A  we  have  P(BA)  = P(B)P(A|B).  The  vertical 
bar  can  usually  be  translated  as  "given"  or  "given  that."  The  notation  on 
the  left  of  the  bar  refers  to  the  event  under  consideration  and  the  nota- 
tion on  the  right  to  a condition  under  which  the  event  can  take  place. 

P(B| A)  is  called  conditional  probability  and  could  be  read  "the  prob- 
ability of  B,  given  that  A has  already  happened,"  or  simply  "the  prob- 
ability of  B given  A."  When  the  events  are  independent,  P (B | A)  = P(B) ; 
that  is,  the  conditional  probability  of  B occurring  is  the  same  as  the 
unconditional  probability  of  B.  Extending  the  multiplication  rule  to  a 
series  of  three  events  A,B,C  occurring  in  that  order,  we  have  P(ABC)  = 
P(A)P(B| A)P(C| AB)  where  P(c|AB)  is  the  probability  of  C occurring,  given 
that  A and  B have  already  occurred. 

2.4  SAMPLING  WITH  REPLACEMENT 

When  a sample  is  drawn  and  each  selected  element  is  returned  to  the 
population  before  the  next  selection  is  made,  the  method  of  sampling  is 
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called  "sampling  with  replacement."  In  this  case,  the  outcome  of  one 
selection  does  not  change  the  selection  probabilities  for  another 
selection. 

Suppose  a sample  of  n elements  is  selected  with  replacement.  Let  the 

values  of  X in  the  sample  be  x, ,x0,...,x  where  X-  is  the  value  of  X 

12  n 1 

obtained  on  the  first  selection,  x^  the  value  obtained  on  the  second 
selection,  etc.  Notice  that  x^  is  a random  variable  that  could  be  equal 
to  any  value  in  the  population  set  of  values  X^ ,X^ , . . . ,X^ , and  the  prob- 
ability that  x^  equals  X^  is  P^.  The  same  statement  applies  to  x^ , etc. 
Since  the  selections  are  independent,  the  probability  of  getting  a sample 
of  n in  a particular  order  is  the  product  of  the  selection  probabilities 

namely,  p(x. )p(x«) . . .p(x  ) where  p(x. ) is  the  P.  for  the  element  selected 
1 i.  n 1 i 

on  the  first  draw,  p(x^)  is  the  P^  for  the  element  selected  on  the  second 
draw,  etc. 

Illustration  2,3.  As  an  illustration,  consider  a sample  of  two 
elements  selected  with  equal  probability  and  with  replacement  from  a popu- 
lation of  four  elements.  Suppose  the  values  of  some  characteristic  X for 
the  four  elements  are  X , X , X0,  and  X..  There  are  16  possibilities: 


Vxi 


X2’X1 


VX1 


VX1 


xx,x2  x2,x2  x3,x2  xA,x2 

x1,x3  x2,x3  x3,x3  x4,x3 


X1’X4 


X2,X4 


X3>X4 


VX4 


In  this  illustration  p(x^)  is  always  equal  to  ^ and  pC^)  is  always  ^ 
Hence  each  of  the  16  possibilities  has  a probability  of  (-jr)  (^*)  = ~ . 
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Each  of  the  16  possibilities  is  a different  permutation  that  could 
be  regarded  as  a separate  sample.  However,  in  practice  (as  we  are  not 
concerned  about  which  element  was  selected  first  or  second)  it  is  more 
logical  to  disregard  the  order  of  selection.  Hence,  as  possible  samples 
and  the  probability  of  each  occurring,  we  have: 

Sample  Probability  Sample  Probability 


X1,X1  1/16 

X1#X2  1/8 

XlfX3  1/8 

XX,X4  1/8 

X2,X2  1/16 


X2,X3  1/8 

X2,X4  1/8 

X3,X3  1/16 

X3,X4  1/8 

X4,X4  1/16 


Note  that  the  sum  of  the  probabilities  is  1.  That  must  always  be  the 
case  if  all  possible  samples  have  been  listed  with  the  correct  prob- 
abilities. Also  note  that,  since  the  probability  (relative  frequency 
of  occurrence)  of  each  sample  is  known,  the  average  for  each  sample  is 
a random  variable.  In  other  words,  there  were  10  possible  samples,  and 
any  one  of  10  possible  sample  averages  could  have  occurred  with  the 
probability  indicated.  This  is  a simple  illustration  of  the  fact  that 
the  sample  average  satisfies  the  definition  of  a random  variable.  As 
the  theory  of  sampling  unfolds,  we  will  be  examining  the  properties  of 
a sample  average  that  exist  as  a result  of  its  being  a random  variable. 

Exercise  2.3.  With  reference  to  Illustration  2.3,  suppose  the 

113  1 

probabilities  of  selection  were  P^  = , P2  * g-,  Pg  = -g,  and  P4  = •£. 

Find  the  probability  of  each  of  the  ten  samples.  Remember  the  sampling 

is  with  replacement.  Check  your  results  by  adding  the  10  probabilities. 
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The  sum  should  be  1.  Partial  answer:  For  the  sample  composed  of  elements 

2 and  4 the  probability  is  (|-)  (-■)  + (^)  (-|)  = ~ 

2.5  SAMPLING  WITHOUT  REPLACEMENT 

When  a selected  element  is  not  returned  to  the  population  before  the 

next  selection  is  made,  the  sampling  method  is  called  sampling  without 

replacement.  In  this  case,  the  selection  probabilities  change  from  one 

draw  to  the  next;  that  is,  the  selections  (events)  are  dependent. 

As  above,  assume  a population  of  N elements  with  values  of  some 

characteristic  X equal  to  ,X^ , . . . ,X^.  Let  the  selection  probabilities 

for  the  first  selection  be  P.  ,. . . ,P.  , . . .P._  where  each  P.>0  and  ZP.  = 1. 

1 l N l l 

Suppose  three  elements  are  selected  without  replacement.  Let  x^ , and 

x^  be  the  values  of  X obtained  on  the  first,  second,  and  third  random 

draws,  respectively.  What  is  the  probability  that  x..  = X_,  x_  = X , and 

1 5 L o 

X-  = X_?  Let  P(X_,X, ,X_)  represent  this  probability, which  is  the  prob- 
J / 5o/ 

ability  of  selecting  elements  5,  6,  and  7 in  that  order. 

According  to  the  multiplicative  probability  law  for  dependent  events, 


P(X5,X6,X7)  = P(X5)P(X6|X5)P(X?|X5,X6) 

It  is  clear  that  P(X^)  = P^.  For  the  second  draw  the  selection  prob- 
abilities (after  element  5 is  eliminated)  must  be  adjusted  so  they  add 
to  1.  Hence,  for  the  second  draw  the  selection  probabilities  are 


1-p  * i-p  ’ 1 _p  » i_p  » i _p  »••* » i_p 

5 5 L *5  5 5 N 


P P 

N . That  is,  P(XJX_)  = 6 


61  5'  1-P5  * 


Similarly,  P(X_|X_,X,)  = 


7'  5*  6y  l-P.-P,  ’ 
5 6 


P(x5,x6,x7)  = (P5)(I^)(IA_) 

J JO 


Therefore , 


(2.1) 
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Observe  that  PCXg.X^)  - (p6>  > • Hence,  PCXj.Xg.X^  * 

6 6 5 

P(X. ,XC,X_)  unless  Pc  = P..  In  general,  each  permutation  of  n elements 
0 0/  0 o 

has  a different  probability  of  occurrence  unless  the  P^’s  are  all  equal. 
To  obtain  the  exact  probability  of  selecting  a sample  composed  of  ele- 
ments 5,  6,  and  7,  one  would  need  to  compute  the  probability  for  each  of 
the  six  possible  permutations  and  get  the  sura  of  the  six  probabilities. 

Incidentally,  in  the  actual  process ‘of  selection,  it  is  not  neces- 
sary to  compute  a new  set  of  selection  probabilities  after  each  selection 
is  made.  Make  each  selection  in  the  same  way  that  the  first  selection 
was  made.  If  an  element  is  selected  which  has  already  been  drawn,  ignore 
the  random  number  and  continue  the  same  process  of  random  selection 
until  a new  element  is  drawn. 

As  indicated  by  the  very  brief  discussion  in  this  section,  the 
theory  of  sampling  without  replacement  and  with  unequal  probability  of 
selection  can  be  very  complex.  However,  books  on  sampling  present  ways 
of  circumventing  the  complex  problems.  In  fact,  it  is  practical  and 
advantageous  in  many  cases  to  use  unequal  probability  of  selection  in 
sampling.  The  probability  theory  for  sampling  with  equal  probability 
of  selection  and  without  replacement  is  relatively  simple  and  will  be 
discussed  in  more  detail. 

Exercise  2.4.  For  a population  of  4 elements  there  are  six  possible 

samples  of  two  when  sampling  without  replacement.  Let  = ^ 2 ~ 

3 1 

P3  - -g,  and  P^  = ■£.  List  the  six  possible  samples  and  find  the  prob- 
ability of  getting  each  sample.  Should  the  probabilities  for  the  six 
samples  add  to  1?  Check  your  results. 
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Exercise  2.5.  Suppose  two  elements  are  selected  with  replacement 
and  with  equal  probability  from  a population  of  100  elements.  Find  the 
probability:  (a)  that  element  number  10  is  not  selected,  (b)  that  ele- 

ment number  10  is  selected  only  once,  and  (c)  that  element  number  10  is 
selected  twice?  As  a check,  the  three  probabilities  should  add  to  1. 
Why?  Find  the  probability  of  selecting  the  combination  of  elements  10 
and  20. 

Exercise  2.6.  Refer  to  Exercise  2.5  and  change  the  specification 
"with  replacement"  to  "without  replacement."  Answer  the  same  questions. 
Why  is  the  probability  of  getting  the  combination  of  elements  10  and  20 
greater  than  it  was  in  Exercise  2.5? 


2.6  SIMPLE  RANDOM  SAMPLES 

In  practice,  nearly  all  samples  are  selected  without  replacement. 
Selection  of  a random  sample  of  n elements,  with  equal  probability  and 
without  replacement,  from  a population  of  N elements  is  called  simple 
random  sampling  (srs).  One  element  must  be  selected  at  a time,  that  is, 
n separate  random  selections  are  required. 

First,  the  probability  of  getting  a particular  combination  of  n 
elements  will  be  discussed.  Refer  to  Equation  (2.1)  and  the  discussion 
preceding  it.  The  P^'s  are  all  equal  to  for  simple  random  sampling. 
Therefore,  Equation  (2.1)  becomes  P(X^,X^,X^)  = (^jj-)  (— -jO  * AU  Per” 

mutations  of  the  three  elements  5,  6,  and  7 have  the  same  probability  of 
occurrence.  There  are  3!  = 6 possible  permutations.  Therefore,  the 
probability  that  the  sample  is  composed  of  the  elements  5,  6,  and  7 is 
I (N^2)  * other  combination  of  three  elements  has  the  same 


probability  of  occurrence. 
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In  general,  all  possible  combinations  of  n elements  have  the  same 
chance  of  selection  and  any  particular  combination  of  n has  the  following 
probability  of  being  selected: 

(1) (2) (3) . . . (n)  = nf(N-n)!  ( 

N (N-l) (N-2) . . . (N-n+1)  N!  v 

N! 

According  to  a theorem  on  number  of  combinations,  there  are 
possible  combinations  (samples)  of  n elements.  If  each  combination  of 
n elements  has  the  same  chance  of  being  the  sample  selected,  the  probability 
of  selecting  a specified  combination  must  be  the  reciprocal  of  the  number 
of  combinations.  This  checks  with  Equation  (2.2). 

An  important  feature  of  srs  that  will  be  needed  in  the  chapter  on 
expected  values  is  the  fact  that  the  j**1  element  of  the  population  is  as 
likely  to  be  selected  at  the  i^  random  draw  as  any  other.  A general 
expression  for  the  probability  that  the  element  of  the  population  is 
selected  at  the  i^  drawing  is 


K N MN-lMN-2;,# 


rN-i+l 

CN-i+2 


)( 


1 

N-i+1 


(2.3) 


Let  us  check  Equation  2.3  for  i - 3.  The  equation  becomes 


(SziwSzlwJL) 

^ N MN-lMN-2' 


1 

N 


The  probability  that  the  jth  element  of  the  population  is  selected  at  the 

third  draw  is  equal  to  the  probability  that  it  was  not  selected  at  either 

the  first  or  second  draw  times  the  conditional  probability  of  being 

selected  at  the  third  draw,  given  that  it  was  not  selected  at  the  first 

or  second  draw.  (Remember,  the  sampling  is  without  replacement).  Notice 

that  is  the  probability  that  the  jth  element  is  not  selected  at  the 
N-2 

first  draw  and  — — is  the  conditional  probability  that  it  was  not  selected 
at  the  second  draw.  Therefore,  (^~)  (|j— ) is  the  probability  that  the 


J 
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element  has  not  been  selected  prior  to  the  third  draw.  When  the  third 


draw  is  made,  the  conditional  probability  of  selecting  the  element 
is  • Hence  the  probability  of  selecting  the  j*"*1  element  at  the  third 
2j  draw  is  (^~)  (~“)  ^ . This  verifies  Equation  (2.3)  for  i = 3. 

To  summarize,  the  general  result  for  any  size  of  sample  is  that  the 

element  in  a population  has  a probability  equal  to  — of  being  selected 

til  th 

at  the  i drawing.  It  means  that  (the  value  of  X obtained  at  the  l 

draw)  is  a random  variable  that  has  a probability  of  — of  being  equal  to 


any  value  of  the  set  X^,...,X^. 


th 


What  probability  does  the  j element  have  of  being  included  in  a 
sample  of  n?  We  have  just  shown  that  it  has  a probability  of  i of  being 
selected  at  the  it^1  drawing.  Therefore,  any  given  element  of  the  popula- 
tion has  n chances,  each  equal  to  ^ , of  being  included  in  a sample.  The 
element  can  be  selected  at  the  first  draw,  ojr  the  second  draw,...,  or  the 
n^  draw  and  it  cannot  be  selected  twice  because  the  sampling  is  without 
replacement.  Therefore  the  probabilities,  j-  for  each  of  the  n draws,  can 
be  added  which  gives  ^ as  the  probability  of  any  given  element  being 
included  in  the  sample. 

Illustration  2.4.  Suppose  one  has  a list  of  1,000  farms  which  includes 
some  farms  that  are  out-of-scope  (not  eligible)  for  a survey.  There  is  no 
way  of  knowing  in  advance  whether  a farm  on  the  list  is  out-of-scope.  A 
simple  random  sample  of  200  farms  is  selected  from  the  list.  All  200  farms 
are  visited  but  only  the  ones  found  to  be  in  scope  are  included  in  the 
sample.  What  probability  does  an  in-scope  farm  have  of  being  in  the  sam- 
ple? Every  farm  on  the  list  of  1000  farms  has  a probability  equal  to  -j 


50 


of  being  in  the  sample  of  200.  All  in-scope  farms  in  the  sample  of  200 
are  included  in  the  final  sample.  Therefore,  the  answer  is 

Exercise  2.7.  From  the  following  set  of  12  values  of  X a srs  of 
three  elements  is  to  be  selected:  2,  10,  5,  8,  1,  15,  7,  8,  13,  4,  6, 

and  2.  Find  P(x>12)  and  P(3<x<12).  Remember  that  the  total  possible 
number  of  samples  of  3 can  readily  be  obtained  by  formula.  Since  every 
possible  sample  of  three  is  equally  likely,  you  can  determine  which  sam- 
ples will  have  an  x<3  or  an  x>12  without  listing  all  of  the  numerous 

— t _ q _ 208 

possible  samples.  Answer:  P(x>12)  = -jJq  » P(x<3)  = jjo  ’ p(3<x<12)  = JJo* 

2.7  SOME  EXAMPLES  OF  RESTRICTED  RANDOM  SAMPLING 

There  are  many  methods  other  than  srs  that  will  give  every  element 
an  equal  chance  of  being  in  the  sample,  but  some  combinations  of  n ele- 
ments do  not  have  a chance  of  being  the  sample  selected  unless  srs  is 
used.  For  example,  one  might  take  every  k^  element  beginning  from  a 
random  starting  point  between  1 and  k.  This  is  called  systematic  sam- 
pling. For  a five  percent  sample  k would  be  20.  The  first  element  for 
the  sample  would  be  a random  number  between  1 and  20.  If  it  is  12,  then 
elements  12,  32,  52,  etc.,  compose  the  sample.  Every  element  has  an 
equal  chance,  , of  being  in  the  sample,  but  there  are  only  20  com- 

binations of  elements  that  have  a chance  of  being  the  sample  selected. 
Simple  random  sampling  could  have  given  the  same  sample  but  it  is  the 
method  of  sampling  that  characterizes  a sample  and  determines  how  error 
due  to  sampling  is  to  be  estimated.  One  may  think  of  sample  design  as  a 
matter  of  choosing  a method  of  sampling;  that  is,  choosing  restrictions 
to  place  on  the  process  of  selecting  a sample  so  the  combinations  which 
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have  a chance  of  being  the  sample  selected  are  generally  "better"  than 
many  of  the  combinations  that  could  occur  with  simple  random  sampling. 

At  the  same  time,  important  properties  that  exist  for  simple  random  sam- 
ples need  to  be  retained.  The  key  properties  of  srs  will  be  developed  in 
the  next  two  chapters. 

Another  common  method  of  sampling  involves  classification  of  all 
elements  of  a population  into  groups  called  strata.  A sample  is  selected 
from  each  stratum.  Suppose  elements  of  the  population  are  in  the  ith 
stratum  and  a simple  random  sample  of  n^  elements  is  selected  from  it. 

This  is  called  stratified  random  sampling.  It  is  clear  that  every  ele- 

th  ni 

ment  in  the  1 stratum  has  a probability  equal  to  — of  being  in  the 

ni  i 

sample.  If  the  sampling  fraction,  — , is  the  same  for  all  strata, 

i n . 

every  element  of  the  population  has  an  equal  chance,  namely  — , of 

i 

being  in  the  sample.  Again  every  element  of  the  population  has  an  equal 
chance  of  selection  and  of  being  in  the  sample  selected,  but  some  combi- 
nations that  could  occur  when  the  method  is  srs  cannot  occur  when 
stratified  random  sampling  is  used. 

So  far,  our  discussion  has  referred  to  the  selection  of  individual 
elements,  which  are  the  units  that  data  pertain  to.  For  sampling  purposes 
a population  must  be  divided  into  parts  which  are  called  sampling  units. 

A sample  of  sampling  units  is  then  selected.  Sampling  units  and  elements 
could  be  identical.  But  very  often,  it  is  either  not  possible  or  not 
practical  to  use  individual  elements  as  sampling  units.  For  example, 
suppose  a sample  of  .households  is  needed.  A list  of  households  does  not 
exist  but  a list  of  blocks  covering  the  area  to  be  surveyed  might  be  avail- 
able. In  this  case,  a sample  of  blocks  might  be  selected  and  all  households 
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within  the  selected  blocks  included  in  the  sample.  The  blocks  are  the 
sampling  units  and  the  elements  are  households.  Every  element  of  the 
population  should  belong  to  one  and  only  one  sampling  unit  so  the  list  of 
sampling  units  will  account  for  all  elements  of  the  population  without 
duplication  or  omission.  Then,  the  probability  of  selecting  any  given 
element  is  the  same  as  the  probability  of  selecting  the  sampling  unit 
that  it  belongs  to. 

Illustration  2.5.  Suppose  a population  is  composed  of  1800  dwelling 

units  located  within  150  well-defined  blocks.  There  are  several  possible 

sampling  plans.  A srs  of  25  blocks  could  be  selected  and  every  dwelling 

unit  in  the  selected  blocks  could  be  included  in  the  sample.  In  this 

case,  the  sampling  fraction  is  ~ and  everv  dwelling  unit  has  a probability 

o 

of  ~ of  being  in  the  sample.  Is  this  a srs  of  dwelling  units?  No,  but 
one  could  describe  the  sample  as  a random  sample  (or  a probability  sample) 
of  dwelling  units  and  state  that  every  dwelling  unit  had  an  equal  chance 
of  being  in  the  sample.  That  is,  the  term  "simple  random  sample"  would 
apply  to  blocks,  not  dwelling  units.  As  an  alternative  sampling  plan,  if 
there  were  twelve  dwelling  units  in  each  of  the  150  blocks,  a srs  of  two 
dwelling  units  could  be  selected  from  each  block.  This  scheme,  which  is  an 
example  of  stratified  random  sampling,  would  also  give  every  dwelling  unit 
a probability  equal  to  ■—  of  being  in  the  sample. 

Illustration  2.6.  Suppose  that  a sample  is  desired  of  100  adults 
living  in  a specified  area.  A list  of  adults  does  not  exist,  but  a list 
of  4,000  dwelling  units  in  the  area  is  available.  The  proposed  sampling 
plan  is  to  select  a srs  of  100  dwelling  units  from  the  list.  Then,  the 
field  staff  is  to  visit  the  sample  dwellings  and  list  all  adults  living 
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in  each.  Suppose  there  are  220  adults  living  in  the  100  dwelling  units. 

A simple  random  sample  of  100  adults  is  selected  from  the  list  of  220. 
Consider  the  probability  that  an  adult  in  the  population  has  of  being  in 
the  sample  of  100  adults. 

Parenthetically,  we  should  recognize  that  the  discussion  which 
follows  overlooks  important  practical  problems  of  definition  such  as  the 
definition  of  a dwelling  unit,  the  definition  of  an  adult,  and  the  defini- 
tion of  living  in  a dwelling  unit.  However,  assume  the  definitions  are 
clear,  that  the  list  of  dwelling  units  is  complete,  that  no  dwelling  is 
on  the  list  more  than  once,  and  that  no  ambiguity  exists  about  whether 
an  adult  lives  or  does  not  live  in  a particular  dwelling  unit.  Incom- 
plete definitions  often  lead  to  inexact  probabilities  or  ambiguity  that 
gives  difficulty  in  analyzing  or  interpreting  results.  The  many  practical 
problems  should  be  discussed  in  an  applied  course  on  sampling. 

It  is  clear  that  the  probability  of  a dwelling  unit  being  in  the 
sample  is  — ■ . Therefore,  every  person  on  the  list  of  220  had  a chance 
of  of  being  on  the  list  because,  under  the  specifications,  a person 
lives  in  one  and  only  one  dwelling  unit,  and  an  adult’s  chance  of  being 
on  the  list  is  the  same  as  that  of  the  dwelling  unit  he  lives  in. 

The  second  phase  of  sampling  involves  selecting  a simple  random 
sample  of  100  adults  from  the  list  of  220.  The  conditional  probability 
of  an  adult  being  in  the  sample  of  100  is  '^^5’  = Xl  * That  is,  given  the 
fact  that  an  adult  is  on  the  list  of  220,  he  now  has  a chance  of  of 
being  in  the  sample  of  100. 

Keep  in  mind  that  the  probability  of  an  event  happening  is  its  rela- 
tive frequency  in  repeated  trials.  If  another  sample  were  selected 
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following  the  above  specifications,  each  dwelling  unit  in  the  population 
would  again  have  a chance  of  of  being  in  sample;  but,  the  number  of 
adults  listed  is  not  likely  to  be  220  so  the  conditional  probability  at 
the  second  phase  depends  upon  the  number  of  dwellings  units  in  the  sample 
blocks.  Does  every  adult  have  the  same  chance  of  being  in  the  sample? 
Examine  the  case  carefully.  An  initial  impression  could  be  misleading. 
Every  adult  in  the  population  has  an  equal  chance  of  being  listed  in  the 
first  phase  and  every  adult  listed  has  an  equal  chance  of  being  selected 
at  the  second  phase.  But,  in  terms  of  repetition  of  the  whole  sampling 
plan  each  person  does  not  have  exactly  the  same  chance  of  being  in  the 
sample  of  100.  The  following  exercise  will  help  clarify  the  situation 
and  is  a good  exercise  in  probability. 

Exercise  2.8.  Assume  a population  of  5 d.u.fs  (dwelling  units)  with 
the  following  numbers  of  adults : 

Dwelling  Unit  No.  of  Adults 


1 

2 

3 

4 

5 


2 

4 

1 

2 

3 


A srs  of  two  d.u.'s  is  selected.  A srs  of  2 adults  is  then  selected  from 
a list  of  all  adults  in  the  two  d.u.’s.  Find  the  probability  that  a speci- 
fied adult  in  d.u.  No.  1 has  of  being  in  the  sample.  Answer:  0.19.  Find 


the  probability  that  an  adult  in  d.u.  No.  2 has  of  being  in  the  sample. 
Does  the  probability  of  an  adult  being  in  the  sample  appear  to  be  related 
to  the  number  of  adults  in  his  d.u.?  In  what  way? 
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An  alternative  is  to  take  a constant  fraction  of  the  adults  listed 
instead  of  a constant  number.  For  example,  the  specification  might  have 
been  to  select  a random  sample  of  y of  the  adults  listed  in  the  first 
phase.  In  this  case,  under  repeated  application  of  the  sampling  speci- 
fications, the  probability  at  the  second  phase  does  not  depend  on  the 
outcome  of  the  first  phase  and  each  adult  in  the  population  has  an  equal 
chance,  (^q)  (y)  = "Iq"  » °f  being  selected  in  the  sample.  Notice  that 
under  this  plan  the  number  of  adults  in  a sample  will  vary  from  sample 
to  sample;  in  fact,  the  number  of  adults  in  the  sample  is  a random  variable. 

For  some  surveys,  interviewing  more  than  one  adult  in  a dwelling  unit 
is  inadvisable.  Again,  suppose  the  first  phase  of  sampling  is  to  select 
a srs  of  100  dwelling  units.  For  the  second  phase,  consider  the  following: 
When  an  interviewer  completes  the  listing  of  adults  in  a sample  dwelling, 
he  is  to  select  one  adult,  from  the  list  of  those  living  in  the  dwelling, 
at  random  in  accordance  with  a specified  set  of  instructions.  He  then 
interviews  the  selected  adult  if  available;  otherwise,  he  returns  at  a 
time  when  the  selected  adult  is  available.  What  probability  does  an  adult 
living  in  the  area  have  of  being  in  the  sample?  According  to  the  multi- 
plication theorem,  the  answer  is  P>(D)P(a|d)  where  P'(D)  is  the  probability 
of  the  dwelling  unit,  in  which  the  adult  lives,  being  in  the  sample  and 
P(A|  D)  is  the  probability  of  the  adult  being  selected  given  that  his 

dwelling  is  in  the  sample.  More  specifically,  P^(D)  = ~ and  P(A|D)  = , 

th  i 

where  k^,  is  the  number  of  adults  in  the  i dwelling.  Thus,  an  adult's 

chance,  (—•)(——)  , of  being  in  a sample  is  inversely  proportional  to  the 
i 

number  of  adults  in  his  dwelling  unit. 

Exercise  2.9.  Suppose  there  are  five  dwelling  units  and  12  persons 
living  in  the  five  dwelling  units  as  follows: 
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Dwelling  Unit 
1 
2 

3 

4 

5 


Individuals 


1,  2 

3,  4,  5,  6 
7,  8 
9 

10,  11,  12 

1.  A sample  of  two  dwelling  units  is  selected  with  equal  probability 
and  without  replacement.  All  individuals  in  the  ^selected  dwelling  units 
are  in  the  sample.  What  probability  does  individual  number  4 have  of  being 
in  the  sample?  Individual  number  9? 

2.  Suppose  from  a list  of  the  twelve  individuals  that  one  individual 
is  selected  with  equal  probability.  From  the  selected  individual  two 
items  of  information  are  obtained:  his  age  and  the  value  of  the  dwelling 
in  which  he  lives.  Let  , X^9, , , 9X^  represent  the  ages  of  the  12  indi- 
viduals and  let  Y^,...,Y^  represent  the  values  of  the  five  dwelling  units. 
Clearly,  the  probability  of  selecting  the  i*"*1  individual  is  anc*  the16" 

fore  P(X^)  = Yj  . Find  the  five  probabilities  P(Y^)  , . . . ,P(Y^) . Do  you 
2 

agree  that  P(Y^)  = • As  a check,  EP(Y^)  should  equal  one. 

3.  Suppose  a sample  of  two  individuals  is  selected  with  equal  prob- 
ability and  without  replacement . Let  Y^  be  the  value  of  Y obtained  at 

the  first  draw  and  Y. . be  the  value  of  Y.  obtained  at  the  second  draw. 

2j  J 

Does  P(Y^)  = PCY^j)?  That  is,  is  the  probability  of  getting  Y^  on  the 
second  draw  the  same  as  it  was  on  the  first?  If  the  answer  is  not  evident, 
refer  to  Section  2.5. 

Exercise  2.10.  A small  sample  of  third-grade  students  enrolled  in 
public  schools  in  a State  is  desired.  The  following  plan  is  presented  only 
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as  an  exercise  and  without  consideration  of  whether  it  is  a good  one:  A 

sample  of  10  third-grade  classes  i§  to  be  selected.  All  students  in  the 
10  classes  will  be  included  in  the  sample. 

Step  1.  Select  a srs  of  10  school  districts. 

Step  2.  Within  each  of  the  10  school  districts,  prepare  a list 

of  public  schools  having  a third  grade.  Then  select  one 
school  at  random  from  the  list. 

Step  3.  For  each  of  the  10  schools  resulting  from  Step  2,  list 
the  third-grade  classes  and  select  one  class  at  random. 

(If  there  is  only  one  third-grade  class  in  the  school, 

it  is  in  the  sample).  This  will  give  a sample  of  10  classes. 

Describe  third-grade  classes  in  the  population  which  have  relatively 
small  chances  of  being  selected.  Define  needed  notation  and  write  a 
mathematical  expression  representing  the  probability  of  a third-grade 
class  being  in  the  sample. 

2.8  TWO-STAGE  SAMPLING 

For  various  reasons  sampling  plans  often  employ  two  or  more  stages 
of  sampling.  For  example,  a sample  of  counties  might  be  selected,  then 
within  each  sample  county  a sample  of  farms  might  be  selected. 

Units  used  at  the  first  stage  of  sampling  are  usually  called  primary 
sampling  units  or  psu’s.  The  sampling  units  at  the  second  stage  of  sam- 
pling could  be  called  secondary  sampling  units.  However,  since  there  has 
been  frequent  reference  earlier  in  this  chapter  to  ’’elements  of  a popula- 
tion," the  sampling  units  at  the  second  stage  will  be  called  elements. 

In  the  simple  case  of  two-stage  sampling,  each  element  of  the  popu- 
lation is  associated  with  one  and  only  one  primary  sampling  unit.  Let  i 
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be  the  index  for  psu's  and  let  j be  the  index  for  elements  within  a psu. 

Thus  represents  the  value  of  some  characteristic  X for  the  j*"*1  element 

in  the  it^1  psu.  Also,  let 

M = the  total  number  of  psu's, 

m = the  number  of  psu's  selected  for  a sample, 

til 

= the  total  number  of  elements  in  the  i psu,  and 

til 

n^  * the  number  of  elements  in  the  sample  from  the  i psu. 


Then, 


M 

IN.  = N,  the  total  number  of  elements  in  the  population,  and 
i 

m 

En.  = n,  the  total  number  of  elements  in  the  sample, 
i 1 

Now  consider  the  probability  of  an  element  being  selected  by  a two 
step  process:  (1)  Select  one  psu,  and  (2)  select  one  element  within  the 

selected  psu.  Let, 

= the  probability  of  selecting  the  i ^ psu, 

Pj|.  = the  conditional  probability  of  selecting  the 

element  in  the  i^  psu  given  that  the  i^  psu  has  already 
been  selected,  and 

P„  = the  overall  probability  of  selecting  the  element  in 

. , . th 

the  i psu. 

Then , 


P.  . = P.P.i  . 
ij  i J i 


If  the  product  of  the  two  probabilities,  P^  and  p j | ^ » constant  for 
every  element,  then  every  element  of  the  population  has  an  equal  chance  of 
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being  selected.  In  other  words,  given  a set  of  selection  probabilities 
P1,***,PM  ^°r  t*ie  Psu's>  one  could  specify  that  ^ and  compute  , 

1 


where  p. | i Np. 


ij  N r -j 

, so  every  element  of  the  population  will  have  an  equal 


chance  of  selection. 

Exercise  2.11.  Refer  to  Table  2.1.  An  element  is  to  be  selected  by 

a three-step  process  as  follows:  (1)  Select  one  of  the  Y classes  (a  row) 

Ni. 

with  probability  — , (2)  within  the  selected  row  select  an  X class  (a 
N N 

ii 

column)  with  probability  ^ , (3)  within  the  selected  cell  select  an 

i» 


element  with  equal  probability.  Does  each  element  in  the  population  of  N 
elements  have  an  equal  probability  of  being  drawn?  What  is  the  probability? 

The  probability  of  an  element  being  included  in  a two-stage  sample 
is  given  by 


P:.  = FTP  Ti 

ij  i ji 


where 


(2.4) 


?'  = the  probability  that  the  i1"^  psu  is  in  the  sample 
of  psu’s , and 

P_T|  ^ = the  conditional  probability  which  the  j element  has 
of  being  in  the  sample,  given  that  the  i*^  psu  has 
been  selected. 


The  inclusion  probability  P_j\  will  be  discussed  very  briefly  for  three 


important  cases : 

(1)  Suppose  a random  sample  of  m psu’s  is  selected  with  equal  prob- 
ability and  without  replacement.  The  probability,  P'  , of  the  i^  psu 

being  in  the  sample  is  f.  = 77  where  f.  is  the  sampling  fraction  for  the 

1 M 1 

first-stage  units.  In  the  second  stage  of  sampling  assume  that,  within 
each  of  the  m psu’s,  a constant  proportion,  f , of  the  elements  is  selected. 
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That  is,  in  the  i psu  in  the  sample,  a simple  random  sample  of  n^  ele- 
ments out  of  N.  is  selected,  the  condition  being  that  n.  = f_N..  Hence, 

1 1 2 i 

the  conditional  probability  of  the  j ^ element  in  the  i^  psu  being  in 

n. 

the  sample  is  P T|  ^ = — = f ^ . Substituting  in  Equation  2.4,  we  have 

= f,f.  which  shows  that  an  element’s  probability  of  being  in  the 
ij  12 

sample  is  equal  to  the  product  of  the  sampling  fractions  at  the  two  stages. 
In  this  case  P^  is  constant  and  is  the  overall  sampling  fraction. 

Unless  N.  is  the  same  for  all  psu's,  the  size  of  the  samnle, 

l 

n^  = ^2^1  > varies  from  psu  to  psu.  Also,  since  the  psu’s  are  selected 

ra  m 

at  random  the  total  size  of  the  sample,  n = In.  = f„  IN. , is  not  constant 

.1  2.i 

l i 

with  regard  to  repetition  of  the  sampling  plan.  In  practice  variation  in 
the  size,  n^,  of  the  sample  from  psu  to  psu  might  be  very  undesirable.  If 
appropriate  information  is  available,  it  is  possible  to  select  psu’s  with 
probabilities  that  will  equalize  the  sample  sizes  n^  and  also  keep  P* 
constant . 


N. 

(2)  Suppose  one  psu  is  selected  with  probability  P^  = — . This 
is  commonly  known  as  sampling  with  pns  (probability  proportional  to  size), 
Within  the  selected  psu,  assume  that  a simple  random  samnle  of  k elements 


is  selected.  (If  any  are  less  than  k,  consolidations  could  be  made  so 


all  psu's  have  an  N^  greater  than  k) . Then, 

N.  N 

p:  = tt1  . P-l  • - 4 , and  p:.  = rri  ^ 

l N j I i lj  N N_^ 


which  means  that  every  element  of  the  population  has  an  equal  probability, 
— , of  being  included  in  a sample  of  k elements. 


Extension  of  this  sampling  scheme  to  a sample  of  m psu's  could 
encounter  the  complications  indicated  in  Section  2.5.  However,  it  was 
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stated  that  means  exist  for  circumventing  those  complications.  Sampling 

books  1/  discuss  this  matter  quite  fully  so  we  will  not  include  it  in  this 

monograph.  The  point  is  that  one  can  select  ra  psu's  without  replacement 

in  such  a way  that  m ~ is  the  probability  of  including  the  ith  psu  in 

Ni 

the  sample.  That  is,  P_T  = m — . If  a random  sample  of  k elements  is 
selected  with  equal  probability  from  each  of  the  selected  psu's, 

Pj  | i “ N . and 

1 N.  , 

n'  ( 1 \ /k  * mk  n 

pij  = tr)(N7)  = H"  = N 


Thus,  if  the  are  known  exactly  for  all  M psu's  in  the  population, 
and  if  a list  of  elements  in  each  psu  is  available,  it  is  possible  to 
select  a two-stage  sample  of  n elements  so  that  k elements  for  the  sample 
come  from  each  of  m psu's  and  every  element  of  the  population  has  an  equal 
chance  of  being  in  the  sample.  In  practice,  however,  one  usually  finds 
one  of  two  situations:  (a)  there  is  no  information  on  the  number  of  ele- 

ments in  the  psu's,  or  (b)  the  information  that  does  exist  is  out-of-date. 
Nevertheless,  out-of-date  information  on  number  of  elements  in  the  psu's 
can  be  very  useful.  It  is  also  possible  that  a measure  of  size  might 
exist  which  will  serve,  more  efficiently,  the  purposes  of  sampling. 

(3)  Suppose  that  characteristic  Y is  used  as  a measure  of  size.  Let 

th  Yi 

be  the  value  of  Y for  the  iL  psu  in  the  population  and  let 

H 

where  Y = EY.  . A sample  of  m psu's  is  selected  in  such  a way  that 
Y . * 

PT  = m — is  the  probability  that  the  i psu  has  of  being  in  the  sample. 


1/  For  example,  Hansen,  Hurwitz,  and  Madow.  Sample  Survey  Methods  and 
Theory.  Volume  I,  Chapter  8.  John  Wiley  and  Sons.  1953. 
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With  regard  to  the  second  stage  of  sampling,  let  f^.  be  the  sampling 
fraction  for  selecting  a simple  random  sample  within  the  i*"*1  psu  in  the 


sample.  That  is,  PTi  . = f 0 . . Then, 
’ j|x  2i 

Pij  = Y~ } ^f2i) 


(2.5) 


In  setting  sampling  specifications  one  would  decide  on  a fixed  value 
for  PT^.  In  this  context  Pjj  is  the  overall  sampling  fraction  or  propor- 
tion of  the  population  that  is  to  be  included  in  the  sample.  For  example, 
if  one  wanted  a 5 percent  sample,  Pj\  would  be  .05.  Or,  if  one  knew  there 
were  approximately  50,000  elements  in  the  population  and  wanted  a sample 
of  about  2,000,  he  would  set  P^  = .04.  Hence,  we  will  let  f be  the  over- 
all sampling  fraction  and  set  P^  equal  to  f.  Decisions  are  also  made  on 
the  measure  of  size  to  be  used  and  on  the  number,  m,  of  psu's  to  be  selected. 


In  Equation  2.5,  this  leaves  f^^  to  be  determined.  Thus,  f ^ is  computed 
as  follows  for  each  psu  in  the  sample: 


f2i  mY. 

i 

Use  of  the  sampling  fractions  f^^  at  the  second  stage  of  samoling  will  give 
every  element  of  the  population  a probability  equal  to  f of  being  in  the 
sample.  A sample  wherein  every  element  of  the  population  has  an  equal 
chance  of  inclusion  is  often  called  a self-weighted  sample. 
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CHAPTER  III.  EXPECTED  VALUES  OF  RANDOM  VARIABLES 
3.1  INTRODUCTION 

The  theory  of  expected  values  of  random  variables  is  used  exten- 
sively in  the  theory  of  sampling;  in  fact,  it  is  the  foundation  for 
sampling  theory.  Interpretations  of  the  accuracy  of  estimates  from 
probability  samples  depend  heavily  on  the  theory  of  expected  values. 

The  definition  of  a random  variable  was  discussed  in  the  previous 
chapter.  It  is  a variable  that  can  take  (be  equal  to)  any  one  of  a 

defined  set  of  values  with  known  probability.  Let  be  the  value  of  X 
th 

for  the  i element  in  a set  of  N elements  and  let  P.  be  the  probability 

1 J 

that  the  i^  element  has  of  being  selected  by  some  chance  operation  so 
that  P^  is  known  a priori.  What  is  the  expected  value  of  X? 

Definition  3.1.  The  expected  value  of  a random  variable  X is 
N N 

I P.X.  where  Z P.=l.  The  mathematical  notation  for  the  expected  value 

.,11  . , l 

i=l  i=l 

N 

of  X is  E(X).  Hence,  bv  definition,  E(X)  = Z P.X.  . 

i=l  1 1 

Observe  that  ZP^X_^  is  a weighted  average  of  the  values  of  X,  the 

weights  being  the  probabilities  of  selection.  "Expected  value"  is  a 

substitute  expression  for  "average  value."  In  other  words,  E means  "the 

average  value  of"  or  "find  the  average  value  of"  whatever  follows  E.  For 
2 2 

example,  E(X  ),  read  "the  expected  value  of  X ,"  refers  to  the  average  value 
of  the  squares  of  the  values  that  X can  equal.  That  is,  by  definition, 

2 N 2 
E(X  ) = Z P.X7  . 

i=l  1 1 

If  all  of  the  N elements  have  an  equal  chance  of  being  selected,  all 
values  of  P^  must  equal  because  of  the  requirement  that  ZP^  =1.  In 


64 


this  case,  E(X)  = E — X. 

i=l  1 


EX, 

] 

N 


= X , which  is  the  simple  average  of  X 


for  all  N elements. 

Illustration  3.1.  Assume  12  elements  having  values  of  X as  follows: 


II 

rH 

X 

3 

X5  ■ 5 

Xg  = 10 

X2- 

9 

X6  " 3 

X10=  3 

X3 

3 

X7  = 4 

Xll  = 8 

X4  "■ 

5 

X8  = 3 

II 

CM 

i — 1 

X 

3+9+ . . .+4 

For  this  set,E(X)  = = 5,  assuming  each  element  has  the  same 

chance  of  selection.  Or,  by  counting  the  number  of  times  that  each 
unique  value  of  X occurs,  a frequency  distribution  of  X can  be  obtained 
as  follows : 


X, 


.1 


3 

4 

5 
8 
9 

10 


N. 


where  X.  is  a unique  value  of  X and  N.  is  the  number  of  times  X.  occurs 
3 3 3 


EN.X.  EX 

We  noted  in  Chapter  I that  EN4  = N,  EN_.X4  = EX^ , and  that  " - = X 


3 3 


Suppose  one  of  the  X^  values  is  selected  at  random  with  a probability  equal 
N.  N. 

to  P.  where  P.  = -r-J—  = — 1 . What  is  the  expected  value  of  X.  ? By 
3 3 ZN  N K 3 
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definition  E(X  ) = EP.X.  = Err^-  X = — = X . The  student  may  verify 
j J J N j N ' 

that  in  this  illustration  E(X^)  = 5.  Note  that  the  selection  specifica- 
tions were  equivalent  to  selecting  one  of  the  12  elements  at  random  with 


equal  probability. 

Incidentally,  a frequency  distribution  and  a probability  distribution 
are  very  similar.  The  probability  distribution  with  reference  to  X^  would 


be: 


3 

4 

5 
8 
9 

10 


5/12 

2/12 

2/12 

1/12 

1/12 

1/12 


The  12  values,  P^  = ^ , for  the  12  elements  are  also  a probability  distri- 
bution. This  illustration  shows  two  ways  of  treating  the  set  of  12 
elements. 

When  finding  expected  values  be  sure  that  you  understand  the  defini- 
tion of  the  set  of  values  that  the  random  variable  might  equal  and  the 
probabilities  involved. 

Definition  3.2.  When  X is  a random  variable,  by  definition  the 
expected  value  of  a function  of  X is 


N 

E[f (X) ] = E P [f (X  )] 
i-1 

2 

Some  examples  of  simple  functions  of  X are:  f(X)  = aX,  f(X)  = X , 
f (X)  = a + bX  + cX2 , and  f(X)  = (X-X)2  . For  each  value,  , in  a 

defined  set  there  is  a corresponding  value  of  f(X^). 
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Illustration  3.2.  Suppose  f(X)  = 2X+3.  With  reference  to  the  set 
of  12  elements  discussed  above,  there  are  12  values  of  f(X^)  as  follows: 
f(X1)  = (2) (3)  +3=9 
f(X2)  = (2) (9)  + 3 = 21 

f(X12)  = 2(4)  + 3 = 11 

Assuming  P.  = -^  the  expected  value  of  f(X)  = 2X+3  would  be 
l N 

1 11  1 

E(2X+3)  = E ^(2X.+3)  = (~)  W + Ojj)  (21)+.  . .+(±j)  (11)  = 13  (3.1) 

i 

In  algebraic  terms,  for  f(X)  = aX+b , we  have 
N 

E ( aX+b ) = E P.(aX.+b)  = EP.(aX.)  + EP.b 
i=l  1 1 11  1 

By  definition  EP^aX^)  = E(aX)  , and  EP^  = E(b)  . Therefore, 

E ( aX+b ) = E (aX)  + E(b)  (3.2) 

Since  b is  constant  and  EP . = 1,  EP.b  = b,  which  leads  to  the  first 

l l 

important  theorem  in  expected  values. 

Theorem  3.1.  The  expected  value  of  a constant  is  equal  to  the 
constant:  E(a)  = a. 

By  definition  E(aX)  = EP^(aX^)  = aEP^X^.  Since  EP^X_^  = E(X) > we  have 
another  important  theorem: 

Theorem  3.2.  The  expected  value  of  a constant  times  a variable  equals 

the  constant  times  the  expected  value  of  the  variable:  E(aX)  = aE(X). 

Applying  these  two  theorems  to  Equation  (3.2)  we  have  E(aX+b)  = 
aE(X)  + b.  Therefore,  with  reference  to  Illustration  3.2,  E(2X+3)  = 

2E(X)  + 3 = 2(5)  + 3 = 13,  which  is  the  same  as  the  result  found  in 
Equation  (3.1). 


67 


Exercise  3.1.  Suppose  a random  variable  X can  take  any  of  the 
following  four  values  with  the  probabilities  indicated: 


xi  ■ 

2 X2  = 5 

X3  = 4 

X = 6 
4 

pi  = 

2/6  P2  = 2/6 

P3  = 1/6 

P4  = 1/6 

(a) 

Find 

E(X)  Answer: 

4 

(b) 

Find 

2 

E (X  ) Answer: 

18j.  Note 

that  E(Xi 2)  * [E (X) ] 2 

(c) 

Find 

E(X-X)  Answer: 

0 Note: 

By  definition 

4 

E(X-X) 

= E P.(X.-X) 
i=l  1 1 

(d) 

Find 

E (X-X) 2 Answer : 

2^-r.  Note  : 

By  definition 

E(X-X) 2 = E P.(X.-X)2 
i-1  1 1 


Exercise  3.2.  From  the  following  set  of  three  values  of  one 
value  is  to  be  selected  with  a probability  P^: 


Y = -2  Y 

1 2 

= 2 

Y3  = 4 

P{  = 1/4  P2 

= 2/4 

P'  = 1/4 

(a) 

Find  E(Y) 

Answer : 

(b) 

Find  E(i) 

Answer : 

3/16.  Note: 

(c) 

Find  E(Y-Y)2 

Answer : 

4! 

E(Y) 


i E(  |) 


3.2  EXPECTED  VALUE  OF  THE  SUM  OF  TWO  RANDOM  VARIABLES 


The  sum  of  two  or  more  random  variables  is  also  a random  variable. 

If  X and  Y are  two  random  variables,  the  expected  value  of  X + Y is  equal 
to  the  expected  value  of  X plus  the  expected  value  of  Y:E(X+Y)  = E(X)+E(Y). 
Two  numerical  illustrations  will  help  clarify  the  situation. 

Illustration  3.3.  Consider  the  two  random  variables  X and  Y in 
Exercises  3.1  and  3.2: 
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X 


1 


2 


X 


2 


5 


1 

4 

2 

4 

1 

4 


Suppose  one  element  of  the  first  set  and  one  element  of  the  second 
set  are  selected  with  probabilities  as  listed  above.  What  is  the  expected 
value  of  X + Y?  The  joint  probability  of  getting .X^  and  is  P^P'  because 
the  two  selections  are  independent.  Hence  by  definition 


E (X  + Y) 


P.PT 
i J 


(xi + V 


(3. 


The  possible  values  of  X + Y and  the  probability  of  each  are  as  follows: 


X + Y 

P.f 

i 

> " 

X + Y 

P.l 

l 

a •* 

J 

X, 

+ 

Y„ 

0 

p p; 

2 

X. 

+ 

Y 

2 

P P" 

1 

1 

1 

i i 

24 

3 

1 

3 1 

24 

X, 

+ 

Y. 

= 

4 

p p; 

= 

4 

X. 

+ 

Y„ 

= 

6 

p p " 

2 

1 

2 

1 2 

24 

3 

2 

3 2 

24 

x. 

+ 

Y_ 

6 

p p; 

2 

X. 

+ 

Y0 



8 

p p; 

_ 1_ 

1 

3 

1 3 

24 

3 

3 

3 3 

24 

x„ 

+ 

Y_ 

3 

P V' 

2 

X, 

+ 

Y 

= 

4 

p,p; 

1 

2 

1 

2 1 

24 

4 

1 

4 1 

24 

xn 

+ 

Y„ 

7 

p p; 

4_ 

X, 

+ 

Y« 



8 

P v' 

- !_ 

2 

2 

2 2 

24 

4 

2 

4 2 

24 

+ 

Y0 

= 

9 

pj; 

2_ 

X, 

+ 

Y„ 

10 

pp; 

- 1_ 

2 

3 

2 3 

24 

4 

3 

4 3 

24 

As  a check  the  sum  of  the  probabilities  must  be  1 if  all  possible 

sums  have  been  listed  and  the  probabilitv  of  each  has  been  correctly 

determined.  Substituting  the  values  of  X.  + Y.  and  P.P."  in  Equation  (3.3) 

i 3 i J 

we  obtain  5.5  as  follows  for  expected  value  of  X + Y: 

(j^-HO)  + (|j)(4)  + ...  + (—)  (10)  =5.5 
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From  Exercises  3.1  and  3.2  we  have  E(X)  = 4 and  E(Y)  = 1.5.  There- 
fore, E(X)  + E(Y)  =4+  1.5  =5. 5 which  verifies  the  earlier  statement 
that  E (X  + Y)  = E (X)  + E(Y) . 

Illustration  3.4.  Suppose  a random  sample  of  two  is  selected  with 
replacement  from  the  population  of  four  elements  used  in  Exercise  3.1. 

Let  x^  be  the  first  value  selected  and  let  be  the  second.  Then  x^  and 
are  random  variables  and  x^  + x^  is  a random  variable.  The  possible 
values  of  x^  + x^  and  the  probability  of  each,  P(x^,x2),are  listed  below. 
Notice  that  each  possible  order  of  selection  is  treated  separately. 


xi 

X2  p^xi>x2^ 

VX2 

^1 

^2 

P(x1,x2) 

xl+x2 

xi 

xi 

4/36 

4 

X3 

X1 

2/36 

6 

xi 

X2 

4/36 

7 

X3 

X2 

2/36 

9 

xi 

X3 

2/36 

6 

X3 

X3 

1/36 

8 

xi 

X4 

2/36 

8 

X3 

X4 

1/36 

10 

X2 

X1 

4/36 

7 

X4 

X1 

2/36 

8 

X2 

X2 

4/36 

10 

X4 

X2 

2/36 

11 

X2 

X3 

2/36 

9 

X4 

X3 

1/36 

10 

X2 

X4 

2/36 

11 

X4 

X4 

1/36 

12 

By 

definition 

E(X^  + x 

is 

36W  4 

• -56(7>  + §6(6) 

+ . . . + 

k(12)  = 8 

In 

Exercise  3 

. 1 we  found  E(X)  = 4. 

Since  x^ 

is  the  same  random  variable 

as 

X,  E(Xl)  = 

4.  Also,  x^  is  the  same  random 

i variable  as 

X,  and  EC^)  = 

Therefore,  E(: 

x^)  + ECx^)  = 8,  which 

verifies 

that 

: E(x1+x2) 

= E(Xl)  + E(X 

In  general  if  X and  Y are  two  random  variables , where  X might  equal 


X^,...,X^  and  Y might  equal  Y^,...,Y^,  then  E(X  + Y)  - E(X)+E(Y).  The 
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NM 

proof  is  as  follows:  By  definition  E(X+Y)  = EE  P^.(X^+Y.)  where  P is 

ij  3 1 3 13 

the  probability  of  getting  the  sum  X^  + Y^,  and  EEP^  = 1 • The  double 
summation  is  over  all  possible  values  of  (X^+Yj) * According  to 
the  rules  for  summation  we  mav  write 


NM 


NM 


NM 


EE  P..(X.+Y.)  = EE  P..X.  + EE  PJJSY. 

ij  l i it  l ij  l 

ij  J J ij  ij 


(3.4) 


In  the  first  term  on  the  right,  X^  is  constant  with  regard  to  the  summation 
over  j;  and  in  the  second  term  on  the  right,  Y^  is  constant  with  regard 
to  the  summation  over  i.  Therefore,  the  right-hand  side  of  Equation  (3.4) 
can  be  written  as 


N 


M 


N 


M 


E X.  E P. . + E Y.  E P. . 

l . ij  1 . li 

i J J J i 


N 


And,  since  E P..  = P.  and  E P..  = P.  , Equation  (3.4)  becomes 
j 13  1 i 1J  -1 


NM 


M 


EE  P. . (X.+Y.)  = E X.P.  + E Y.P. 

llil  .ii  .11 
ij  J l J 


N M 

By  definition  E X.P.  = E(X)  and  E Y.P.  = E(Y)  . 
i 1 1 j 3 3 

Therefore  E(X+Y)  = E(X)  + E (Y)  . 

If  the  proof  is  not  clear  write  the  values  of  P^j(x^+Yj)  ^n  a matrix 
format.  Then,  follow  the  summation  manipulations  in  the  proof. 

The  above  result  extends  to  any  number  of  random  variables;  that  is, 
the  expected  value  of  a sum  of  random  variables  is  the  sum  of  the  expected 
values  of  each.  In  fact,  there  is  a very  important  theorem  that  applies 


to  a linear  combination  of  random  variables. 
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Theorem  3.3.  Let  u = a^u^  + . . .+  a^u^,  where  are  random 

variables  and  a^,...,a^  are  constants.  Then 
E(u)  = a^E(u^)  + ...+  a^  E^) 
or  in  summation  notation 


E(u)  = E E a.u  = £ a E(u  ) 

i 1 i i 1 1 

The  generality  of  Theorem  3.3  is  impressive.  For  example,  with  refer- 
ence to  sampling  from  a population  X^,...,  X^,  u^  might  be  the  value  of  X 
obtained  at  the  first  draw,  u^  the  value  obtained  at  the  second  draw,  etc. 
The  constants  could  be  weights.  Thus,  in  this  case,  u would  be  a weighted 
average  of  the  sample  measurements.  Or,  suppose  >x2 » * * * ,xk  are  averages 
from  a random  sample  for  k different  age  groups.  The  averages  are  random 
variables  and  the  theorem  could  be  applied  to  any  linear  combination  of  the 

averages.  In  fact  u.  could  be  anv  function  of  random  variables.  That  is, 
i 


the  only  condition  on  which  the  theorem  is  based  is  that  u;*  must  be  a 

1;%'' 


random  variable. 


Illustration  3.5.  Suppose  we  want  to  find  the  expected  value  of 
2 

(X  + Y)  where  X and  Y are  random  variables.  Before  Theorem  3.3  can  be 
applied  we  must  square  (X  + Y) . Thus  E(X  + Y)2  = E(X2  + 2XY  + Y2)  . 

The  application  of  Theorem  3.3  gives  E(X  + Y)2  = E(X)2  + 2E(XY)  + E(Y)2. 
Illustration  3.6.  We  will  now  show  that 

E(X-X) (Y-Y)  = E(XY)  - XY  where  E(X)  = X and  E(Y)  = Y 
Since  (X-X)  (Y-Y)  = XY-}Cf-XY  + XY  we  have 
E(X-X) (Y-Y)  = E (XY-XY-XY+XY ) 
and  application  of  Theorem  3.3  gives 

E (X-X)  (Y-Y)  = E (XY)  - E (XY)  - E(YX)  + E(XY) 
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Since  X and  Y are  constant,  E(XY)  = X E(Y)  = XY,  E(YX)  = YX,  and  E(XY)  = XY 
Therefore,  E(X-X) (Y-Y)  = E(XY)  - XY 

Exercise  3.3.  Suppose  E(X)  = 6 and  E(Y)  = 4.  Find 

(a)  E (2X+4Y)  Answer:  28 

(b)  [E(2X)]2  Answer:  144 

(c)  /E  ( Y ) Answer:  2 

(d)  E(5Y-X)  Answer:  14 

Exercise  3.4.  Prove  the  following,  assuming  E(X)  = X and  E(Y)  = Y: 


(a) 

E(X-X)  = 

0 

(b) 

E(aX-bY) 

+ cE (Y)  = 

aX  + (c-b)Y 

(c) 

E[a(X-X) 

+ b (Y-Y)  ] 

= 0 

(d) 

E (X+a) 2 = 

■ E(X2)  + 

2aX  + a2 

(e) 

E (X-X) 2 - 

■ E(X2)  - 

x2 

(f)  E(aX+bY)  = 0 for  any  values  of  a and  b if  E(X)  = 0 and  E(Y) 
3.3  EXPECTED  VALUE  OF  AN  ESTIMATE 


Theorem  3.3  will  now  be  used  to  find  the  expected  value  of  the  mean 
of  a simple  random  sample  of  n elements  selected  without  replacement  from 
a population  of  N elements.  The  term  "simple  random  sample"  implies  equal 
probability  of  selection  without  replacement.  The  sample  average  is 

x_+. . ,+x 
1 n 

x = 

n 

where  x^,  is  the  value  of  X for  the  it^1  element  in  the  sample.  Without 
loss  of  generality,  we  can  consider  the  subscript  of  x as  corresponding 
to  the  i draw;  i.e.  , x^  is  the  value  of  X obtained  on  the  first  draw, 
x^  the  value  on  the  second,  etc.  As  each  x^  is  a random  variable,  x 
is  a linear  combination  of  random  variables.  Therefore,  Theorem  3.3 


applies  and 
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EOO  = i [E (x1 ) +...+  E(x  )] 
n 1 n 

In  the  previous  chapter.  Section  2.6,  we  found  that  any  given  element  of 

the  population  had  a chance  of  of  being  selected  on  the  i^^1  draw. 

This  means  that  is  a random  variable  that  has  a probability  equal  to  ^ 

of  being  equal  to  any  value  of  the  population  set  X Therefore, 

E(x.)  = E(x,)  - ...  - E(xJ  - X 
I z n 

-X++X—  — — 

Hence,  E(x)  = = X.  The  fact  that  E(x)=  X is  one  of  the  very 

important  properties  of  an  average  from  a simple  random  sample.  Inciden- 
tally, E(x)  = X whether  the  sampling  is  with  or  without  replacement. 

Definition  3.3.  A parameter  is  a quantity  computed  from  all  values 
in  a population  set.  The  total  of  X,  the  average  of  X,  the  proportion  of 
elements  for  which  X^<A,  or  any  other  quantity  computed  from  measurements 
including  all  elements  of  the  population  is  a parameter.  The  numerical 
value  of  a parameter  is  usually  unknown  but  it  exists  by  definition. 

Definition  3.4.  An  estimator  is  a mathematical  formula  or  rule  for 
making  an  estimate  from  a sample.  The  formula  for  a sample  average, 

Zxi 

x = — , is  a simple  example  of  an  estimator.  It  provides  an  estimate  of 
- EXi 

the  parameter  X = — . 

Definition  3.5.  An  estimate  is  unbiased  when  its  expected  value 
equals  the  parameter  that  it  is  an  estimate  of.  In  the  above  example,  x 
is  an  unbiased  estimate  of  X because  E(x)  = X. 

Exercise  3.5.  Assume  a population  of  only  four  elements  having  values 
of  X as  follows:  X^  = 2 , = 5 , X^  = 4 , X^  = 6.  For  simple  random  samples 

of  size  2 show  that  the  estimator  Nx  provides  an  unbiased  estimate  of  the 
population  total,  EX^  = 17.  List  all  six  possible  samples  of  two  and 
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calculate  Nx  for  each.  This  will  give  the  set  of  values  that  the  random 
variable  Nx  can  be  equal  to.  Consider  the  probability  of  each  of  the 
possible  values  of  Nx  and  show  arithmetically  that  E(Nx)  = 17. 

A sample  of  elements  from  a population  is  not  always  selected  by 
using  equal  probabilities  of  selection.  Sampling  with  unequal  probability 
is  complicated  when  the  sampling  is  without  replacement,  so  we  will  limit 
our  discussion  to  sampling  with  replacement. 

Illustration  3.7.  The  set  of  four  elements  and  the  associated  prob- 
abilities used  in  Exercise  3.1  will  serve  as  an  example  of  unbiased 
estimation  when  samples  of  two  elements  are  selected  with  unequal  prob- 
ability and  with  replacement.  Our  estimator  of  the  population  total, 

n x . 

z 

, 1=1  Pi 

2+5+4+6  = 17,  will  be  x'  = . The  estimate  x'  is  a random  variable. 

n 

Listed  below  are  the  set  of  values  that  x'  can  equal  and  the  probability 
of  each  value  occurring. 

x'  P 

Possible  Samples  j j 


X1 

X1 

6 

4/36 

X1 

X2 

10.5 

8/36 

X1 

x3 

15 

4/36 

X1 

x4 

21 

4/36 

X2 

X2 

15 

4/36 

X2 

X3 

19.5 

4/36 

X2 

X4 

25.5 

4/36 

X3 

x3 

24 

1/36 

X3 

x4 

30 

2/36 

x4 

X4 

36 

1/36 
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Exercise  3.6.  Verify  the  above  values  of  x'  and  P.  and  find  the 
J .1 

expected  value  of  x' . By  definition  E(x')  = EP_.x_T.  Your  answer  should 
be  17  because  x'  is  an  unbiased  estimate  of  the  population  total. 

To  put  sampling  with  replacement  and  unequal  probabilities  in  a 
general  setting,  assume  the  population  is  X^  , . . . ,X^  , . . . ,X^  and  the  selec- 
tion probabilities  are  P^ , . . . ,P  , . . . ,P^.  Let  x^  be  the  value  of  X for 
the  i element  in  a sample  of  n elements  and  let  be  the  probability 


which  that  element  had  of  being  selected.  Then  x 


i=i  pi 


is  an  unbiased 


estimate  of  the  population  total.  We  will  now  show  that  E(x^)  = EX. 

j=l  1 

To  facilitate  comparison  of  x'  with  u in  Theorem  3.3»  ™ay  be 
written  as  follows: 

i xi  , x 

x'  = i(— ) +...+  it11) 

n Pl  n pn 


It  is  now  clear  that  a.  = — and  u.  = — 

in  l p. 

l 


Therefore , 


X X 

E(x')  = ~[E(“-)  +...+  E(^)] 


(3.5) 


1 

The  quantity  — , which  is  the  outcome  of  the  first  random  selection  from 


the  population,  is  a random  variable  that  might  be  equal  to  any  one  of  the 


set  of  values 


X,  X. 

_i.  -j. 

pi  ’ 


x X. 

, . The  probability  that  — equals  is  P . . 

PN  P1  j 2 


Therefore,  by  definition 


x,  N X.  N 

EV  “ E V?4  = E xi 

1 3 3 3 

X. 

Since  the  sampling  is  with  replacement  it  is  clear  that  any  — - is  the  same 
xi  1 


random  variable  as 
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Therefore  Equation  (3.5)  becomes 

N N 

E(x')  = — [E  X.  + ...+  £ X.] 
n . j . i 

3 3 

Since  there  are  n terms  in  the  series  it  follows  that 

N 

E(x')  = Z X.  . 

j J 

Exercise  3.7.  As  a corollary  show  that  the  expected  value  of  — is 
equal  to  the  population  mean. 

By  this  time,  you  should  be  getting  familiar  with  the  idea  that  an 
estimate  from  a probability  sample  is  a random  variable.  Persons  respon- 
sible for  the  design  and  selection  of  samples  and  for  making  estimates 
from  samples  are  concerned  about  the  set  of  values,  and  associated 
probabilities,  that  an  estimate  from  a sample  might  be  equal  to. 

Definition  3.6.  The  distribution  of  an  estimate  generated  by  prob- 
ability sampling  is  the  sampling  distribution  of  the  estimate. 

The  values  of  x'  and  P.  in  the  numerical  Illustration  3.7  are  an 
3 3 

example  of  a sampling  distribution.  Statisticians  are  primarily  inter- 
ested in  three  characteristics  of  a sampling  distribution:  (1)  the  mean 

(center)  of  the  sampling  distribution  in  relation  to  the  value  of  the 
parameter  being  estimated,  (2)  a measure  of  the  variation  of  possible 
values  of  an  estimate  from  the  mean  of  the  sampling  distribution,  and 
(3)  the  shape  of  the  sampling  distribution.  We  have  been  discussing  the 
first.  When  the  expected  value  of  an  estimate  equals  the  parameter  being 
estimated,  we  know  that  the  mean  of  the  sampling  distribution  is  equal  to 
the  parameter  estimated.  But,  in  practice,  values  of  parameters  are 
generally  not  known.  To  judge  the  accuracy  of  an  estimate,  we  need 
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j 

information  on  all  three  characteristics  of  the  sampling  distribution. 

Let  us  turn  now  to  the  generally  accepted  measure  of  variation  of  a random 
variable. 

3. A VARIANCE  OF  A RANDOM  VARIABLE 

The  variance  of  a random  variable,  X,  is  the  average  value  of  the  squares 

_ 2 

of  the  deviation  of  X from  its  mean;  that  is,  the  average  value  of  (X-X)  . 
The  square  root  of  the  variance  is  the  standard  deviation  (error)  of  the 
variable. 

Definition  3.7.  In  terms  of  expected  values,  the  variance  of  a random 
- 2 

variable,  X,  is  E(X-X)  where  E(X)  = X.  Since  X is  a random  variable, 

_ 2 

(X-X)  is  a random  variable  and  by  definition  of  expected  value, 


- 2 N - 2 
E(X-X)  = L P (X.-X) 

i 1 1 

In  case  ^ we  have  the  more  familiar  formula  for  variance,  namely. 


zcx^x)2 

E(X-X)2  = L— a2 

2 2 2 2 

Commonly  used  symbols  for  variance  include:  a , a , V , S , Var(X) 

A 

Z(X  -X)2 

and  V(X).  Variance  is  often  defined  as  — rr-r . This  will  be  discussed 

N-l 


in  Section  3.7. 


3.4.1  VARIANCE  OF  THE  SUM  OF  TWO  INDEPENDENT  RANDOM  VARIABLES 

Two  random  variables,  X and  Y,  are  independent  if  the  joint  probability, 
Pjj  , of  getting  Xi  and  Y is  equal  to  (P^)(Pj),  where  P^^  is  the  probability 
of  selecting  X^  from  the  set  of  values  of  X,  and  P^  is  the  probability  of 
selecting  from  the  set  of  values  of  Y.  The  variance  of  the  sum  of  two 

independent  random  variables  is  the  sum  of  the  variance  of  each.  That  is. 
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2 2 
aX+  °Y 


Illustration  3.8.  In  Illustration  3.3,  X and  Y were  independent.  We 
had  listed  all  possible  values  of  X^+Y^  and  the  probability  of  each.  From 
that  listing  we  can  readily  compute  the  variance  of  X+Y.  Cy  definition 


Oy  = E[(X+Y)-(X+Y)]2 * *=  ZZ  P.P.[(X.+Y.)-(X+Y)]2 

ATI  . . 11  1 1 


(3.6) 


Substituting  in  Equation  (3.6)  we  have 


°X+Y  = l4(°-5-5)2  + f4(4'5-5)2  +-"+  k(1°'5-5)2  = if 


The  variances  of  X and  Y are  computed  as  follows : 

2 -22  22  21  21  27 

aZ  = E(X-X)Z  = f(2-4)Z  + f(5-4)Z  + ~(4-4)Z  + ^r(6-4)Z  = j 

2 -21  22  21  2 19 

aZ  = E (Y-Y) Z = ^(-2-1. 5)Z  + |(2-1.  5)Z  + ±(4-1. 5)Z  = f- 

2 2 7 19  85 

We  now  have  = — + — = which  verifies  the  above  statement  that 

the  variance  of  the  sum  of  two  independent  random  variables  is  the  sum  of 
the  variances. 

Exercise  3.8.  Prove  that  E [ (X+Y ) - (X+Y ) ] 2 = E(X+Y)2  - (X+Y)2.  Then 

calculate  the  variance  of  X+Y  in  Illustration  3.3  by  using  the  formula 
2 2 - - 2 

ax+Y  = E (X+Y)  - (X+Y)  . The  answer  should  agree  with  the  result  obtained 

in  Illustration  3.8. 

Exercise  3.9.  Refer  to  Illustration  3.3  and  the  listing  of  possible 
values  of  X + Y and  the  probability  of  each.  Instead  of  X^+Y.  list  the 
products  (Xi-X)(Y^-Y)  and  show  that  EtX^-X) (Y^-Y)  = 0. 

Exercise  3.10.  Find  E(X-X)(Y-Y)  for  the  numerical  example  used  in 
Illustration  3.3  by  the  formula  E(XY)  - XY  which  was  derived  in  Illustra- 


tion 3.6. 
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3.4.2  VARIANCE  OF  THE  SUM  OF  TWO  DEPENDENT  RANDOM  VARIABLES 

The  variance  of  dependent  random  variables  involves  covariance  which 
is  defined  as  follows : 

Definition  3.8.  The  covariance  of  two  random  variables,  X and  Y,  is 
E(X-X)(Y-Y)  where  E(X)  = X and  E(Y)  = Y.  By  definition  of  expected  value 


E(X-X) (Y-Y)  = EE  P, .(X  -X)(Y.-Y) 


ij 


i r i 


3 


where  the  summation  is  over  all  possible  values  of  X and  Y. 

Symbols  commonly  used  for  covariance  are  a , S , and  Cov(X,Y). 

XY  XY 

Since  (X+Y)  - (X+Y)  = (X-X)  + (Y-Y)  we  can  derive  a formula  for  the 
variance  of  X+Y  as  follows : 

2 


2 

X+Y 


- E[ (X-X)  + (Y-Y)]2 
= E[ (X-X)2  + (Y-Y)2  + 2 (X-X) (Y-Y)] 


Then,  according  to  Theorem  3.3, 


02+y  = E(X-X) 2 + E(Y-Y)2  + 2E (X-X) (Y-Y) 
and  by  definition  we  obtain, 

°X+Y  * ax  + aY  + 2aXY 
2 

Sometimes  o^  is  used  instead  of  to  represent  variance.  Thus 


aX+Y  aXX  + CTYY  + 2aXY 


For  two  independent  random  variables,  P 


ij 


P^P  . Therefore 


E (X-X) (Y-Y)  = EE  P P (X-X) (Y-Y) 

ij  J J 


Write  out  in  longhand,  if  necessary,  and  be  satisfied  that  the  following 


is  correct: 


1 
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(3.7) 


which  proves  that  the  cavariance  a is  zero  when  X and  Y are  independent. 

AY 


Notice  that  in  Equation  (3.7)  IP.(X.-X)  = E(X-X)  and  ZP.(Y.-Y)  = E(Y-Y) 

j ^ ^ • .1  J 


which,  for  independent  randpm  variables,  proves  that  E(X-X)(Y-Y)  = 

E(X-X)  E(Y-Y).  When  working  with  independent  random  variables  the  following 
important  theorem  is  frequently  very  useful: 

Theorem  3.4.  The  expected  value  of  the  product  of  independent  random 
variables  u^ , u^,...,  is  the  product  of  their  expected  values: 


3.5  VARIANCE  OF  AN  ESTIMATE 

The  variance  of  an  estimate  from  a probability  sample  depends  upon 
the  method  of  sampling.  We  will  derive  the  formula  for  the  variance  of  x, 
the  mean  of  a random  sample  selected  with  equal  probability,  with  and 
without  replacement.  Then,  the  variance  of  an  estimate  of  the  population 
total  will  be  derived  for  sampling  with  replacement  and  unequal  probability 
of  selection. 

3.5.1  EQUAL  PROBABILITY  OF  SELECTION 

The  variance  of  x,  the  mean  of  a random  sample  of  n elements  selected 
with  equal  probabilities  and  with  replacement  from  a population  of  N,  is: 


E(u1u2*..uk)  = E(u1)E(u2) . . .E(uk) 


2 

- °x 

Var(x)  = — 
n 


2 i 
where  a = — 
A 


iCx^-x)2 


The  proof  follows: 

By  definition,  Var(x)  = E[x-E(x)]2.  We  have  shown  that  E(x)  = X.  Therefore, 
- - 2 

Var(x)  = E(x-X)  . By  substitution  and  algebraic  manipulation,  we  obtain 
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Var(x)  = E [ 


x,+. . ,+x 
1 n 


x]2 


E[ 


(x  -X)+...  + (x  -X) 

1 n , L 


n 


K E [ E(x  -X)2  + Z Z (x  -X)  (x  -X)  ] 
n i=l  1 i?ij  1 3 


Applying  Theorem  3.3  we  now  obtain 


1 n _ 2 _ 

Var(x)  = Ay  [ EE(x.-X)  + l EE(x.-X)  (x  -X)  ] 

n i=l  1 i*j  1 3 


(3.8) 


In  series  form,  Equation  (3.8)  can  be  written  as 

Var(x)=  ~ [E(x1~X)2  + E(x2~X)2  +...+  ECXj-X) (x2-X)  + ECXj-X)  (x3~X)+. . . ] 
n 


Since  the  sampling  is  with  replacement  x^  and  x^  are  independent  and 

the  expected  value  of  all  of  the  product  terms  is  zero.  For  example, 

E(x^-X) (x2~X)  = E(x^-X)  ECx^-X)  and  we  know  that  E(x^-X)  and  ECx^-X)  are 

- 2 

zero.  Next,  consider  E(x^-X)  . We  have  already  shown  that  x^  is  a 
random  variable  that  can  be  equal  to  any  one  of  the  population  set  of 
values  X^,...,X^  with  equal  probability.  Therefore 

2 


N 


e(X]L-x)2 


E(X.-X) 
.1  J 
N 


The  same  argument  applies  to  x9 , x , etc.  Therefore, 

J 2 

n - 2 2 2 2 - ax 

E E(x.-X)  * a +. . .+  a = no  and  Equation  (3.8)  reduces  to  Var(x)  = — . 

1 A A A n 


The  mathematics  for  finding  the  variance  of  x when  the  sampling  is 
without  replacement  is  the  same  as  sampling  with  replacement  down  to  and 
including  Equation  (3.8).  The  expected  value  of  a product  terra  in  Equation 
(3.8)  is  not  zero  because  x^  and  x^  are  not  independent.  For  example,  on 
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the  first  draw  an  element  has  a probability  of  of  being  selected,  but 
on  the  second  draw  the  nrobabilitv  is  conditioned  by  the  fact  that  the 
element  selected  on  the  first  draw  was  not  replaced.  Consider  the  first 
product  term  in  Equation  (3.8).  To  find  E (x^-X) (x2~X)  we  need  to  consider 
the  set  of  values  that  (x^-X) (x2~X)  could  be  equal  to.  Reference  to  the 
following  matrix  is  helpful: 

(Xx-X)2  (X1-X)(X2-X)  ...  (X1-X)(XN-X) 

(X2-X)  (Xj-X)  (X2-X)2  ...  (X2-X)(XN-X) 

(Xf)-X)  (Xj-X)  (Xjj-XKXj-X)  ...  (x!;-x)  2 

The  random  variable  (x^-X)(x2~X)  has  an  equal  probability  of  being  any  of 
the  products  in  the  above  matrix,  except  for  the  squared  terms  on  the  main 
diagonal.  There  are  N(N-l)  such  products.  Therefore, 

N N 

E E (X.-X)(X.-X) 

E(x1-a)  (x2-X)  = 'y(N-l) 

According  to  Equation  (1.9)  in  Chapter  1, 

N N _ _ N 

E E (X.-X)(X.-X)  = - E (X.-X) 

1 3 i 1 


Hence , 


E(xx-X) (x2-X) 


N - 2 
E(X.-X) 

i 


X 

N-l 


N(N-l) 

The  same  evaluation  applies  to  all  other  product  terms  in  Equation  (3.8). 
There  are  n(n-l)  product  terms  in  Equation  (3.8)  and  the  expected  value  of 


each  is  - — —r  . 

N— i 


Thus,  Equation  (3.8)  becomes 
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_ 1 n - 2 °X 

Var(x)  = +2  [Z  E(x  -X)Z  - n(n-l)  ~r) 

n i 


_ 2 

Recognizing  that  E(x^-X) 


and  after  some  easy  algebraic  operations 


the  answer  as  follows  is  obtained: 

2 


Var(x)  = 


N-n  fx 
N-l  n 


(3.9) 


N-n  . 


The  factor  is  called  the  correction  for  finite  population  because  it 
does  not  appear  when  infinite  populations  are  involved  or  when  sampling 
with  replacement  which  is  equivalent  to  sampling  from  an  infinite  population. 

For  two  characteristics,  X and  Y,  of  elements  in  the  same  simple  random 
sample,  the  covariance  of  x and  v is  given  by  a formula  analogous  to 
Equation  (3.9);  namely. 


„ N-n  XY 

Cov(x,y)  = jj-j  — 


3.5.2  UNEQUAL  PROBABILITY  OF  SELECTION 


(3.10) 


In  Section  3.3  we  proved  that  x'  = 


n x. 
S 

i Pi 
n 


is  an  unbiased  estimate 


of  the  population  total.  This  was  for  sampling  with  replacement  and 
unequal  probability  of  selection.  We  will  now  proceed  to  find  the  vari- 
ance of  x'  . 

, 2 N 

By  definition  Var(x')  = E[x^-  E(x")]  . Let  X = I X^  . Then  since 

E(x')  = X,  it  follows  that 


i 


X-  x 

-U...+  s. 


Var(x')  = E [- 


— - X]2  = ir-  E[(r^  - X)+...  + (-2-  - X) 
n2  P1  Pn 


1 Xi  2 Xi 

±r  E[I(-i  -xr  + EEP  - x)(— 

n2  Pi  i*k  Pi  Pk 


X)] 
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Applying  Theorem  3.3,  Var(x')  becomes 

1 x « x.  x 

Var(x')  = [EE(—  - X)Z  + E EE(—  - X)  (—  - X)]  (3.11) 

n2  pi  pi  pk 

Notice  the  similarity  of  Equations  (3.8)  and  (3.11)  and  that  the  steps 
leading  to  these  two  equations  were  the  same.  Again,  since  the  sampling 
is  with  replacement,  the  expected  value  of  all  product  terms  in  Equation 
(3.11)  is  zero.  Therefore  Equation  (3.11)  becomes 

1 n x 

Var(x')  - Ay  [E  E(-^  - X)Z] 

n2  i pi 


i 2 i 2 

By  definition  E(—  - X)  = E P.(~*  - X) 

pi  i 1 pi 


Therefore 


Var(x') 


N X 2 

E P (^  - X)Z 
i i 


(3.12) 


Exercise  3.11.  (a)  Refer  to  Exercise  3.1  and  compute  the  variance 
of  x'  for  samples  of  two  (that  is,  n * 2)  using  Equation  (3.12).  (b)  Then 

turn  to  Illustration  3.7  and  compute  the  variance  of  x'  from  the  actual 
values  of  x' . Donft  overlook  the  fact  that  the  values  of  x'  have  unequal 
probabilities.  According  to  Definition  3.7,  the  variance  of  x'  is 

10  2 

E P.(x7  * X)  where  X = E(x')  , x'  is  one  of  the  10  possible  values  of  x' , 
j J 3 3 

and  P^  is  the  probability  of  xj  . 

3.6  VARIANCE  OF  A LINEAR  COMBINATION 


Before  presenting  a general  theorem  on  the  variance  of  a linear 
combination  of  random  variables,  a few  key  variance  and  covariance  rela- 
tionships will  be  given.  In  the  following  equations  X and  Y are  random 
variables  and  a,  b,  c,  and  d are  constants: 
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I 


Var (X+a)  = Var(X) 

IVar(aX)  = a2Var(X) 

Var(aX+b)  = a2Var(X) 

Co v (X+a ,Y+b)  = Cov(X,Y) 

Cov(aX,bY)  = abCov(X,Y) 

Cov(aX+b ,cY+d)  = acCov(X,Y) 

Var (X+Y)  = Var(X)  + Var(Y)  + 2Cov(X,Y) 


Var (X+Y+a)  = Var (X+Y) 

Var (aX+bY)  = a2Var(X)  + b2Var(Y)  + 2abCov(X,Y) 

Illustration  3.9.  The  above  relationships  are  easily  verified  by 
using  the  theory  of  expected  values.  For  example, 

Var (aX+b)  = E[aX+b-E(aX+b) ] 2 

= E [ aX+b-E ( aX) -E (b ) ] 2 
= E[aX-aE(X)]2 
= E [a(X-X) ]2 
= a2E(X-X)2  = a2Var(X) 

Exercise  3.12.  As  in  Illustration  3.9  use  the  theory  of  expected 
values  to  prove  that 


Cov(aX+b,cY+d)  * acCov(X,Y) 

As  in  Theorem  3.3,  let  u = a^u^+. . ,+a^u^  where  a^,...,a^  are  constants 
and  u^,...,u^  are  random  variables.  By  definition  the  variance  of  u is 
Var (u)  = E [u-E(u) ] 2 
By  substitution 

Var  (u)  = Etaj^u^. . .+akuk-E(a1u1+.  . .+akuk)  ]2 

■ E[a1(u1~u1)+. . .4  ^(Uj^-Uj^)  ]2  where  E(ui>  * u± 
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By  squaring  the  quantity  in  [ ] and  considering  the  expected  values  of 
the  terms  in  the  series,  the  following  result  is  obtained. 

Theorem  3.5.  The  variance  of  u,  a linear  combination  of  random 
variables,  is  given  by  the  following  equation 

k 2 2 

Var(u)  = £ a. a.  + E E a. a. a.. 

ii  ...  l i it 
i if]  J J 

2 

where  a.  is  the  variance  of  u.  and  a. . is  the  covariance  of  u.  and  u.. 
i i ij  1 J 

Theorems  3.3  and  3.5  are  very  useful  because  many  estimates  from 

probability  samples  are  linear  combinations  of  random  variables. 

Illustration  3.10.  Suppose  for  a srs  (simple  random  sample)  that 

data  have  been  obtained  for  two  characteristics  X and  Y,  the  sample 

values  being  x. ,...,x  and  v, ,...,v  . What  is  the  variance  of  x-v? 

1 n 1 ' n 

From  the  theory  and  results  that  have  been  presented  one  can  proceed 
immediately  to  write  the  answer.  From  Theorem  3.5  we  know  that  Var(x-y)  = 
Var(x)  + Var(y)  -2Cov(x,v).  From  the  sampling  specifications  we  know  the 
variances  of  x and  y and  the  covariance.  See  Equations  (3.9)  and  (3.10) 
Thus,  the  following  result  is  easily  obtained: 

Var(x-y)  = (|~)  (^)  (a^  + - 20^)  (3. 

Some  readers  might  be  curious  about  the  relationship  between  covar- 
iance and  correlation.  By  definition  the  correlation  between  X and  Y is 

= Cov(X,Y)  = QXY 

^ /Var  (X)Var  (Y)  °XaY 

Therefore,  one  could  substitute  r avav  f°r  Gw  Equation  (3.13). 

aY  X Y XY 

Exercise  3.13.  In  a statistical  publication  suppose  you  find  87 
bushels  per  acre  as  the  yield  of  corn  in  State  A and  83  is  the  estimated 
yield  for  State  B.  The  estimated  standard  errors  are  given  as  1.5  and 
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2.0  bushels.  You  become  interested  in  the  standard  error  of  the  differ- 
ence in  yield  between  the  two  States  and  want  to  know  how  large  the 
estimated  difference  is  in  relation  to  its  standard  error.  Find  the 
standard  error  of  the  difference.  You  mav  assume  that  the  two  yield 
estimates  are  independent  because  the  sample  selection  in  one  State  was 
completely  independent  of  the  other.  Answer:  2.5. 

Illustration  3.11.  No  doubt  students  who  are  familiar  with  sampling 
have  already  recognized  the  application  of  Theorems  3.3  and  3.5  to  several 
sampling  plans  and  methods  of  estimation.  For  example,  for  stratified 
random  sampling,  an  estimator  of  the  population  total  is 

x " = N _ x,  +,  ..+  N,  x.  = IN . x . 

11  k k 3i 

where  is  the  population  number  of  sampling  units  in  the  i^1  stratum 
and  x^  is  the  average  per  sampling  unit  of  characteristic,!^  from  a sample 
of  n^  sampling  units  from  the  i^1  stratum.  According  to  Theorem  3.3 


There  are  no  covariance  terms  in  Equation  (3.14)  because  the  sample  selection 
in  one  stratum  is  independent  of  another  stratum.  Assuming  a srs  from  each 
stratum,  Equation  (3.14)  becomes 


E(x')  = EIN^  = INiE(x;.) 


If  the  sampling  is  such  that  E(x^)  = X_.  for  all  strata,  x"  is  an  unbiased 
estimate  of  the  population  total.  According  to  Theorem  3.5 


Var(x')  = Var(x^)  + . . .+  Var(x^) 


(3.14) 


Var(x')  = 


2 

where  is  the  variance  of  X among  sampling  units  within  the  i 


th 


stratum. 
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Illustration  3.12.  Suppose  x^,...,x^  are  independent  estimates  of 

2 

the  same  quantity,  T.  That  is,  E(x')  = T.  Let  be  the  variance  of  x^. 

Consider  a weighted  average  of  the  estimates,  namely 

x"  = w^x'  +...+  w^x^  (3.15) 

where  Ew.  = 1.  Then 
i 

E(x")  = w E(xp  + ...+  w E(xp  = T (3.16) 

That  is,  for  any  set  of  weights  where  Ew^  = 1 the  expected  value  of  x'  is 
T.  How  should  the  weights  be  chosen? 

The  variance  of  x'  is 

2 2 2 2 

Var(x')  = w a +. . .+  w a 
11  k k 

If  we  weight  the  estimates  equally,  w^  = and  the  variance  of  x'  is 

- 2 
i l0i 

Var(x')  = i [-j^]  (3.17) 


which  is  the  average  variance  divided  by  k.  However,  it  is  reasonable  to 
give  more  weight  to  estimates  having  low  variance.  Using  differential 
calculus  we  can  find  the  weights  which  will  minimize  the  variance  of  x'. 
The  optimum  weights  are  inversely  proportional  to  the  variances  of  the 

estimates.  That  is,  w_^  « — • 

a . 

i 

As  an  example,  suppose  one  has  two  independent  unbiased  estimates  of 
the  same  quantity  which  originate  from  two  different  samples.  The  optimum 
weighting  of  the  two  estimates  would  be 
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As  another  example,  suppose  x',...,x^  are  the  values  of  X in  a sample 
of  k sampling  units  selected  with  equal  probability  and  with  replacement. 

In  this  case  each  x'  is  an  unbiased  estimate  of  X.  If  we  let  w^  = ~ , x' 
is  x,  the  simple  average  of  the  sample  values.  Notice,  as  one  would  expect, 
Equation  (3.16)  reduces  to  E(x)  = X.  Also,  since  each  estimate,  x'  , is  the 
same  random  variable  that  could  be  equal  to  any  value  in  the  set  X^,...X^, 

2 2 Z(-V*)2 

it  is  clear  that  all  of  the  a.‘s  must  be  equal  to  a = — 

l N 


Hence , 


Equation  (3.17)  reduces  to  — which  agrees  with  the  first  part  of  Section 


3.5.1, 


x. 

l 


Exercise  3.14.  If  you  equate  x'  in  Equation  (3.15)  with  — in 

1 * 

Section  3.5.2  and  let  w.  = — and  k = n,  then  x^  in  Equation  (3.15)  is  the 

in 

A 


same  as  x'  = — — in  Section  3.5.2.  Show  that  in  this  case  Equation  (3.17) 
n 

becomes  the  same  as  Equation  (3.12). 

3.7  ESTIMATION  OF  VARIANCE 

All  of  the  variance  formulas  presented  in  previous  sections  have 
involved  calculations  from  a population  set  of  values.  In  practice,  we 
have  data  for  only  a sample,  lienee,  we  must  consider  means  of  estimating 
variances  from  sample  data. 

3.7.1  SIMPLE  RANDOM  SAMPLING 

In  Section  3.5.1,  we  found  that  the  variance  of  the  mean  of  a srs  is 

2 


Var (x) 


N-n  J 
N-l  n 


(3.18) 


where 


N - 2 

Kx  -xr 

i 

N 
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As  an  estimator  of  o‘ 


£ (x.-x) 
i 


seems  like  a natural  first  choice  for 


X * n 

consideration.  However,  when  sampling  finite  populations,  it  is  customary 
to  define  variance  among  units  of  the  population  as  follows: 


N 

£(X.-X) 

s2-1  1 


N-l 


and  to  use 


n - 2 
£ (x  -x) ^ 

2 i 1 2 

s = as  an  estimator  of  S 

n-l 


A reason  for  this 


will  become  apparent  when  we  find  the  expected  value  of  s as  follows: 

2 

The  formula  for  s can  be  written  in  a form  that  is  more  convenient 
2 

for  finding  E(s  ).  Thus, 


n - 2 
E(x.-x) 

2 i 1 

1 ~ n-l 


v 2 -2 

£x . - nx 

l 

n-l 


and 


E(s2)  = [£E(x?)  - nE(x2)] 

n-l  i 

l 


We  have  shown  previously  that  x^  is  a random  variable  that  has  an  equal 
probability  of  being  any  .value  in  the  set  X^,...,^.  Therefore 


N 2 

£x:  vv2 

2 . 1 2 nSX- 

E(xi)  = — and  ZE(xi)  = ir^ 

i 


Hence , 


? £X2  , 

E(s  > = hr _ E(x )] 


(3 


2 - - 2 

We  know,  by  definition,  that  a-  = E(x  - X)  and  it  is  easy  to  show  that 

- - 2 -2  -2 
E(x-X)  = E(x  ) - X 


-2  2 -2 
E (x  ; = + X . 


Therefore , 
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By  substitution  in  Equation  (3.19)  we  obtain 

EX2 

2.  n , i -2  2. 


E(s  > - hr  - * - 


By  definition  o 


2 ECX^X)2  ex2  _2 

X = N = ~N X and  s*nce  t^ie  specified  method  of 


...  2 N-n  °X  , _ , 2.  n r 2 N-n  aX. 

sampling  was  srs,  o-  = — , we  have  E(s  ) = — j-  [ax  - j-  ~] 

which  after  simplification  is 
w 2.  N 2 

E(s  > = iTT  CTx 

2 2 

Note  from  the  above  definitions  of  o and  S that 

A. 

Q2  - JL  2 

s " N-l  ax 

2 2 
E(s  ) = S 


Therefore 


Since  s^  is  an  unbiased  estimate  of  S^,  we  will  now  substitute  for 

2 

a in  Equation  (3.18)  which  gives 
A 


Var(x)  = 


N-n  S_ 
N n 


(3.20) 


Both  Equations,  (3.18)  and  (3.20),  for  the  Var(x)  give  identical  results 

- - 2 

and  both  agree  with  E(x-X)  as  a definition  of  variance.  We  have  shown 

2 2 2 2 
that  s is  an  unbiased  estimate  of  S . Substituting  s for  S in  Equation 

(3.20)  we  have 


var(x) 


N-n  s_ 
N n 


(3.21) 


as  an  estimate  of  the  variance  of  x.  With  regard  to  Equation  (3.18), 

12  2 ^ 2 

rj  s is  an  unbiased  estimate  of  a..  . When  — — s is  substituted  for 
N X N 


, Equation  (3.21)  is  obtained. 

N-n 

Since  in  Equation  (3.20),  — — is  exactly  1 minus  the  sampling  fraction 

2 2 

and  s is  an  unbiased  estimate  of  S , there  is  some  advantage  to  using 
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2 KX.-X)2 

Equation  (3.20)  and  S = — — : as  a definition  of  variance  among 

N— 1 

sampling  units  in  the  population. 

Exercise  3.15.  For  a small  population  of  4 elements  suppose  the 

values  of  X are  = 2 , = 5 , = 3 ; and  X^  = 6.  Consider  simple 

random  samples  of  size  2.  There  are  six  nossible  samples. 

2 

(a)  For  each  of  the  six  samples  calculate  x and  s . That  is, 

find  the  sampling  distribution  of  x and  the  sampling 
2 

distribution  of  s . 

2 

(b)  Calculate  S , then  find  Var(x)  using  Equation  (3.20). 

(c)  Calculate  the  variance  among  the  six  values  of  x and  compare 
the  result  with  Var(x)  obtained  in  (b) . The  results  should 
be  the  same. 

2 2 

(d)  From  the  sampling  distribution  of  s calculate  E(s  ) and 

2 2 

verify  that  E(s  ) = S . 

3.7.2  UNEQUAL  PROBABILITY  OF  SELECTION 

In  Section  3.5.2,  we  derived  a formula  for  the  variance  of  the 
estimator  x'  where 

x. 

E-i 

Pi 

x"  = — (3.22) 

n 

The  sampling  was  with  unequal  selection  probabilities  and  with  replacement. 

We  found  that  the  variance  of  - x'  was  given  by 

N X. 

xpi(?i  - xr 

Var(x')  = - (3.23) 

n 

As  a formula  for  estimating  Var(x')  from  a sample  one  might  be  inclined, 
as  a first  guess,  to  try  a formula  of  the  same  form  as  Equation  (3.23)  but 
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that  does  not  work.  Equation  (3.23)  is  a weighted  average  of  the  squares 
Xi  2 

of  deviations  (- X)  which  reflects-  the  unequal  selection  probabilities. 

i 

If  one  applied  the  same  weighting  system  in  a formula  for  estimating 
variance  from  a sample  he  would  in  effect  be  applying  the  weights  twice; 
first,  in  the  selection  process  itself  and  second,  to  the  sample  data. 

The  unequal  probability  of  selection  is  already  incorporated  into  the 
sample  itself. 

As  in  some  of  the  previous  discussion,  look  at  the  estimator  as  follows : 


x,  x 


x = - 


X,  +.  . .+  X X. 

1 n , ^ l 

where  x.  = — 

n l p . 

l 


Each  x7  is  an  independent  unbiased  estimate  of  the  population  total.  Since 
each  value  of  x'  receives  an  equal  weight  in  determining  x'  it  appears  that 
the  following  formula  for  estimating  Var(x^)  might  work: 


var(x')  = — 
n 


(3.24) 


where 


E(xr-x') 

2 _ i 

n-1 


By  following  an  approach  similar  to  that  used  in  Section  3.7.1,  one  can 
prove  that 

-NX. 

E(sp  = i p.c^  - xr 

i i 

That  is,  Equation  (3.24)  does  provide  an  unbiased  estimate  of  Var(x')  in 
Equation  (3.23).  The  proof  is  left  as  an  exercise. 

Exercise  3.16.  Reference  is  made  to  Exercise  3.1,  Illustration  3.7, 
and  Exercise  3.11.  In  Illustration  3.7  the  sampling  distribution  of  x' 
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(See  Equation  (3.22))  is  given  for  samples  of  2 from  the  population  of 

4 elements  that  was  given  in  Exercise  3.1. 

2 

s 

(a)  Compute  var(x')  = : — (Equation  (3.24))  for  each  of  the  10 

n 

possible  samples. 

(b)  Compute  the  expected  value  of  var(xO  and  compare  it  with  the 
result  obtained  in  Exercise  3.11.  The  results  should  be  the 
same.  Remember,  when  finding  the  expected  value  of  var(x') , 
that  the  x^’s  do  not  occur  with  equal  frequency. 

3.8  RATIO  OF  TWO  RANDOM  VARIABLES 


In  sampling  theory  and  practice  one  frequently  encounters  estimates 
that  are  ratios  of  random  variables.  It  was  pointed  out  earlier  that 

LI  f C LI  ^ 

E( — ) # 7T7“r  where  u and  w are  random  variables.  Formulas  for  the  expected 
w E(w) 

value  of  a ratio  and  for  the  variance  of  a ratio  will  now  be  presented 


without  derivation.  The  formulas  are  approximations: 

2 


E(“-)  = S.  + 2.  [!h. 

W — — — L 

w w w 


p 0 0 

uw  u w . 


uw 


(3.25) 


whe  re 


2 2 

VarA  - ["]2[^  + % 
w — — z — z 

w u w 


2p  oo 
uw  u w- 


u = E(u) 

w = E(w) 

2 w — \ 2 
o^  = L(u-u) 

2 ~v  2 

a = h(w-w) 
w 


(3.26) 


a 

and  p = where  o = E(u-u)(w-w) 

UW  0 0 uw 

u w 


For  a discussion  of  the  conditions  under  which  Equations  (3.25)  and 


(3.26)  are  good  approximations,  reference  is  made  to  Hansen,  Hurwitz,  and 
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Madow.  2 J The  conditions  are  usually  satisfied  with  regard  to  estimates 

from  sample  surveys.  As  a rule  of  thumb  the  variance  formula  is  usually 

accepted  as  satisfactory  if  the  coefficient  of  variation  of  the  variable 

a 

w 

in  the  denominator  is  less  than  0.1;  that  is,  if  — < 0.1.  In  other  words, 

w 

this  condition  states  that  the  coefficient  of  variation  of  the  estimate  in 

the  denominator  should  be  less  than  10  percent.  A larger  coefficient  of 

variation  might  be  tolerable  before  becoming  concerned  about  Equation  (3.26) 

as  an  approximation. 

o 

w 

The  condition  — < 0.1  is  more  stringent  than  necessary  for  regarding 
w 

the  bias  of  a ratio  as  negligible.  With  few  exceptions  in  practice  the 

bias  of  a ratio  is  ignored.  Some  of  the  logic  for  this  will  appear  in 

the  illustration  below.  To  summarize,  the  conditions  when  Equations  (3.25) 

and  (3.26)  are  not  good  approximations  are  such  that  the  ratio  is  likely  to 

be  of  questionable  value  owing  to  large  variance. 

If  u and  w are  linear  combinations  of  random  variables,  the  theory 

presented  in  previous  sections  applies  to  u and  to  w.  Assuming  u and  w 

are  estimates  from  a sample,  to  estimate  Var(— ) take  into  account  the 

w 

- _ 2 2 

sample  design  and  substitute  in  Equation  (3.26)  estimates  of  u,  w,  o , a , 

u w 

and  p . Ignore  Equation  (3.25)  unless  there  is  reason  to  believe  the  bias 
of  the  ratio  might  be  important  relative  to  its  standard  error. 

It  is  of  interest  to  note  the  similarity  between  Var(u-w)  and  Var(^). 
According  to  Theorem  3.5, 

2 2 

Var(u-w)  = a + a - 2p  o o 
u w uw  u w 


2/  Hansen,  Hurwitz , and  Madow,  Sample  Survey  Methods  and  Theory, 
Volume  I,  Chapter  4,  John  Wiley  and  Sons,  1953. 
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By  definition  the  relative  variance  of  an  estimate  is  the  variance  of  the 
estimate  divided  by  the  square  of  its  expected  value.  Thus,  in  terms  of 
the  relative  variance  of  a ratio,  Equation  (3.26)  can  be  written 


2 2 

Rel  Var(— ) = — ~ - 2p 

w -2  -2  uw 

u w 


a a 
u w 


uw 


The  similarity  is  an  aid  to  remembering  the  formula  for  Var(— ) . 

Illustration  3.13.  Suppose  one  has  a simple  random  sample  of  n 
elements  from  a population  of  N.  Let  x and  y be  the  sample  means  for 

characteristics  X and  Y.  Then,  u = x,  w = y, 

2 2 

2 N-n  SX  , 2 N-n  SY 

o = — — — and  o = — - — 
u N n w N n 


w 

Notice  that  the  condition  discussed  above,  — < 0.1,  is  satisfied  if  the 

w 

sample  is  large  enough  so 
s2 

N-n  Y „ ,2 
-tr-  — , < 0.1 

W nY2 

Substituting  in  Equation  (3.26)  we  obtain  the  following  as  the  variance  of 
the  ratio : 


„ ,xN  ,N-nx ,lx  X2  rSX  . SY  2pXYSXSY 

Var(-)  = (— -)  (-)  -T  [-9  + "^ 

y N n y2  X2  Y2  XY 


x X 

The  bias  of  — as  an  estimate  of  3 is  given  by  the  second  term  of 
Y Y 

Equation  (3.25).  For  this  illustration  it  becomes 


A - [-}  - P-—x-aY-] 

N Y Y2  XY 


As  the  size  of  the  sample  increases , the  bias  decreases  as  — whereas  the 


standard  error  of  the  ratio  decreases  at  a slower  rate,  namely 


/ii 
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Thus,  we  need  not  be  concerned  about  a possibility  of  the  bias  becoming 
important  relative  to  sampling  error  as  the  size  of  the  sample  increases. 
A possible  exception  occurs  when  several  ratios  are  combined.  An  example 
is  stratified  random  sampling  when  many  strata  are  involved  and  separate 
ratio  estimates  are  made  for  the  strata.  This  is  discussed  in  the  books 
on  sampling. 

3.9  CONDITIONAL  EXPECTATION 


The  theory  for  conditional  expectation  and  conditional  variance  of  a 
random  variable  is  a very  important  part  of  sampling  theory,  especially 
in  the  theory  for  multistage  sampling.  The  theory  will  be  discussed  with 
reference  to  two-stage  sampling. 

The  notation  that  will  be  used  in  this  and  the  next  section  is  as 
follows : 

M is  the  number  of  psu's  (primary  sampling  units)  in  the  population. 

m is  the  number  of  psu’s  in  the  sample. 

N^  is  the  total  number  of  elements  in  the  i^  psu. 

M 

N = IN.  is  the  total  number  of  elements  in  the  population, 
i 1 


n^  is  the  sample  number  of  elements  from  the  i^  psu, 


n = En^  is  the  total  number  of  elements  in  the  sample 
i 

n 

n = — 
m 


X.  . is  the  value  of  X for  the  1 ^ element  in  the  i^  dsu.  It 
ij 

refers  to  an  element  in  the  population,  that  is,  j = 1,...,  N^ , 


• » 


M. 


and  i = 1 , . . 
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x is  the  value  of  X for  the  j1'*1  element  in  the  sample  from  the 
iC^  psu  in  the  sample,  that  is,  the  indexes  i and  j refer  to 
the  set  of  psu's  and  elements  in  the  sample. 


X. 

X« 


X. 
x • 


N 

= I X.  . is  the  population  total  for  the  i psu. 

i 1J 
X* . 

= — is  the  average  of  X for  all  elements  in  the 
1 i 


. th 

1 psu. 


mn.  M 

I^X..  IX. 
xj  .1* 

— x i J x 

X.  . = — = — — is  the  average  of  all  N elements. 


M 

IX. 

. l* 

X,  = is  the  average  of  the  psu  totals.  Be  sure  to  note  the 

difference  between  X#  # and  Xt  . 


x . 
x • 


n. 

I x..  is  the  sample  total  for  the 


.th 

x 


psu  in  the  sample. 


x. 

x.  = — is  the  average  for  the  n.  elements  in  the  samnle  from 
x.  n.  & x 

1 ri,  • th 

the  x psu. 

mn. 

EZ1x. . 
ij 

x.  = — J is  the  average  for  all  elements  in  the  samnle. 


Assume  simple  random  sampling,  equal  probability  of  selection  without 
replacement,  at  both  stages.  Consider  the  sample  of  n^  elements  from  the 
i^  psu.  We  know  from  Section  3.3  that  x^  is  an  unbiased  estimate  of  the 
psu  mean  X^  ; that  is,  E(x^  ) = X^  and  for  a fixed  i (a  specified  psu) 

^NiXi  = Ni*^xi  ) = = Xi  * » owing  to  the  first  stage  of  sampling, 
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EN.x.  must  be  treated  as  a random  variable.  Hence,  it  is  necessary  to 
1 1 

become  involved  with  the  expected  value  of  an  expected  value. 

First,  consider  X as  a random  variable,  in  the  context  of  single- 
stage  sampling,  which  could  equal  any  one  of  .the  values  X. . in  the 


M 


i.l 


population  set  of  N = ZN.  . Let  P(ij)  be  the  probability  of  selecting 

i 

the  j element  in  the  i^  psu;  that  is,  P(ij)  is  the  probability  of  X 


being  equal  to  X_ . By  definition 

MN. 

E (X)  = EE1P(ij)X. . 


ij 


1 J 


(3.27) 


Now  consider  the  selection  of  an  element  as  a two-step  procedure: 
(1)  selected  a psu  with  probability  P(i)  , and  (2)  selected  an  element 
within  the  selected  psu  with  probability  P(j|i).  In  words,  P(j|i)  is  the 
probability  of  selecting  the  j element  in  the  i^  psu  given  that  the 
it^1  psu  has  already  been  selected.  Thus,  P(ij)  = P(i)P(j|i).  By  sub- 
stitution. Equation  (3.27)  becomes 


MN 


e(x)  = iJ^PUmjlDx 


13 


13 


or 


M N. 

E(X)  = EP(i)  E1P(j| i)X. . 

i j 


iJ 


(3.28) 


N. 

By  definition,  E P(j|i)X. . is  the  expected  value  of  X for  a fixed  value 

j 1J 

of  i.  It  is  callednconditional  expectation." 


N. 

Let  E^(X|  i)  = I P(j|i)Xi#  where  E^O^i)  is  the  form  of  notation  we 

j 1J 

will  be  using  to  designate  conditional  expectation.  To  repeat,  E^(x| i) 
means  the  expected  value  of  X for  a fixed  i.  The  subscript  2 indicates 
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that  the  conditional  expectation  applies  to  the  second  stage  of  sampling. 

and  will  refer  to  expectation  at  the  first  and  second  stages, 
respectively. 

Substituting  E^ (X | i)  in  Equation  (3.28)  we  obtain 
M 

E (X)  = EP(i)  E (x|i)  (3.29) 

i Z 

There  is  one  value  of  E2(x|i)  for  each  of  the  li  psu's.  In  fact  E2(x|i) 
is  a random  variable  where  the  probability  of  E2(x|i)  is  P(i).  Thus  the 
right-hand  side  of  Equation  (3.29)  is,  by  definition,  the  expected  value 
of  E2(x|i).  This  leads  to  the  following  theorem: 

Theorem  3.6.  E(X)  = E^^CxI  i) 

Suppose  P ( j | i ) = ~ and  P(i)  = ~ . Then, 

N.  1 

E2(X|i)  = Sl(±-)X  = X 

j i 

M _ EX. 

and  E (X)  = E1(X.  ) = Z(^)(X.#)  = pp 

i 

In  this  case  E(X)  is  an  unweighted  average  of  the  psu  averages.  It  is 
important  to  note  that, if  P(i)  and  P(j|i)  are  chosen  in  such  a way  that 
P(ij)  is  constant,  every  element  has  the  same  chance  of  selection.  This 
point  will  be  discussed  later. 

Theorem  3.3  dealt  with  the  expected  value  of  a linear  combination  of 
random  variables.  There  is  a corresponding  theorem  for  conditional  expecta- 
tion. Assume  the  linear  combination  is 


aiui+-+\\ 


t=i 


atut 
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where  are  constants  and  u ,...,u  are  random  variables.  Let 

E(U|c^)  be  the  expected  value  of  U under  a specified  condition,  c^ , where 
c_^  is  one  of  the  conditions  out  of  a set  of  M conditions  that  could  occur. 
The  theorem  on  conditional  expectation  can  then  be  stated  symbolically  as 
follows : 

Theorem  3.7.  E(U|c.)  = a E(uJc.)  + . . .+  a E(u  |c.) 

1 1 11  K K 1 


or  E(u| c^)  = SatE(u  |ct) 


Compare  Theorems  3.7  and  3.3  and  note  that  Theorem  3.7  is  like 
Theorem  3.3  except  that  conditional  expectation  is  applied.  Assume  c is 
a random  event  and  that  the  probability  of  the  event  c^  occurring  is  P(i) . 
Then  E(u| c^)  is  a random  variable  and  by  definition  the  expected  value  of 

M 

E(u|c.)  is  EP(i)E(u|c.)  which  is  E(U) . Thus,  we  have  the  following 
1 i 1 

theorem: 

Theorem  3.8.  The  expected  value  of  U is  the  expected  value  of  the 
conditional  expected  value  of  U,  which  in  symbols  is  written  as  follows: 

E (U)  = EE(u|Ci)  (3 

Substituting  the  value  of  E(u| c^)  from  Theorem  3.7  in  Equation  (3.30) 
we  have 

k 

E(U)  = E[a^E(u^|  c^)  + . . .+a^E(u^|  c^,)  ] = EfEa^ECu^  | c^)  ] (3 

Illustration  3.14.  Assume  two-stage  sampling  with  simple  random 

sampling  at  both  stages.  Let  x^,  defined  as  follows,  be  the  estimator  of 

the  population  total: 

. . m N . n . 

x = — E — Ex..  (3 

m . n.  in 

l i j J 


.30) 


.31) 


.32) 
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Exercise  3.17.  Examine  the  estimator,  x',  Equation  (3.32).  Express 
it  in  other  forms  that  might  help  show  its  logical  structure.  For  example, 

Ni  ni 

for  a fixed  i what  is  — Ex..?  Does  it  seem  like  a reasonable  way  of 

n.  . ii  J 

i J J 

estimating  the  population  total? 

To  display  x'  as  a linear  combination  of  random  variables  it  is 
convenient  to  express  it  in  the  following  form: 


m Ni  vr  Ni  „ N ..  N 

1 ..Ml  , . ,11  m . .Mm  , 

x = [ X,  _+. . .+ x.  ] + . . .+  [ x , +...+—  — x ] 

m n.  11  m n.  In.  m n ml  m n mn 

1 11  m mm 


(3 


Suppose  we  want  to  find  the  expected  value  of  x^  to  determine  whether  it 
is  equal  to  the  population  total.  According  to  Theorem  3.8, 


E(x')  = E1E2  (x'\  i) 


(3 


m N.  n. 

E(x')  = EL(F  Z -i  Zx  ]|i> 
1 z m . n.  . ii 
ill 


(3 


Equations  (3.34)  and  (3.35)  are  obtained  simply  by  substituting  x'  as 


the  random  variable  in  (3.30).  The  c^  now  refers  to  any  one  of  the  m 

psu's  in  the  sample.  First  we  must  solve  the  conditional  expectation, 
, M ^*i 

E_(x^  i).  Since  — and  — are  constant  with  resnect  to  the  conditional 
z m n . 

l 

expectation,  and  making  use  of  Theorem  3.7,  we  can  write 


m N.  n. 
i i 1 


(3 


We  know  for  any  given  psu  in  the  sample  that  x„  is  an  element  in  a 
simple  random  sample  from  the  psu  and  according  to  Section  3.3  its 


expected  value  is  the  psu  mean, 

E2(x..|i)  -X4. 


.33) 

.34) 

.35) 


.36) 


. That  is, 
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and 


n . 

1 


E E2(x  |i)-n  X 

j 


(3 


Substituting  the  result  from  Equation  (3.37)  in  Equation  (3.36)  gives 

m 


M 


E2(X  I i}  NiXi 


(3 


Next  we  need  to  find  the  expected  value  of  E2(x'|i).  In  Equation 


(3.38),  N^  is  a random  variable,  as  well  as  , associated  with  the  first 


stage  of  sampling.  Accordingly,  we  will  take  X = N^X_^  as  the  random 


variable  which  gives  in  lieu  of  Equation  (3.38). 


M 


=m  1 Xi 
1 


Therefore , 


M m 

E(x")  = E [—  E X.  ] 
1 m . l • 
l 


From  Theorem  3.3 


M m M m 

E,R  x ] = f ee.  (x  ) 

1 m ^ i*  m ^ 1 l • 


Since 


Exi 


“ “t1— 1 


E,  [-  EX.  ] = EX. 
1 m . i*  . i 
l l 


M 


Therefore,  E(x')  = E X, 


This  shows  that  x'  is  an  unbiased 


i 


estimator  of  the  population  total. 

3.10  CONDITIONAL  VARIANCE 

Conditional  variance  refers  to  the  variance  of  a variable  under  a 
specified  condition  or  limitation.  It  is  related  to  conditional  prob- 
ability and  to  conditional  expectation. 


.37) 


.38) 
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To  find  the  variance  of  x'  (See  Equation  (3.32)  or  (3.33))  the  following 
important  theorem  will  be  used: 

Theorem  3.9.  The  variance  of  is  given  by 
V(x')  = V1E2(x'|i)  + E1V2(x'|i). 

where  is  the  variance  for  the  first  stage  of  sampling  and  is  the 
"conditional"  variance  for  the  second  stage. 

We  have  discussed  E2(x'|i)  and  noted  there  is  one  value  of  E2(x'|i) 
for  each  psu  in  the  population.  Hence  ^^2  i*s  simply  the  variance 

of  the  M values  of  E2(x''|i). 

In  Theorem  3.9  the  conditional  variance,  V2(x^|  i)  , by  definition  is 
V2(x'|i)  = E2(  [x'-E2(x"|  i)  ]"  | i) 

To  understand  V2(x'|i)  think  of  x'  as  a linear  combination  of  random 
variables  (see  Equation  (3.33)).  Consider  the  variance  of  x'  when  i is 
held  constant.  All  terms  (random  variables)  in  the  linear  combination 
are  now  constant  except  those  originating  from  sampling  within  the  itri 
psu.  Therefore,  l^x^i)  is  associated  with  variation  among  elements  in 
the  i^1  psu.  V2(x'|i)  is  a random  variable  with  M values  in  the  set,  one 
for  each  psu.  Therefore,  E^l^x'li)  by  definition  is 

M 

EjV^x'li)  = ZP(i)V2(x'|i) 
i 

That  is,  E-j^tx'l  i)  is  an  average  of  M values  of  V2(x'|i)  weighted  by 
P(i),the  probability  that  the  i ^ psu  had  of  being  in  the  sample. 

Three  illustrations  of  the  application  of  Theorem  3.9  will  be  given. 

In  each  case  there  will  be  five  steps  in  finding  the  variance  of  x^: 

Step  1,  find  E2(x'|i) 

Step  2,  find  V^^te^i) 
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Step  3,  find  V^Cx'li) 

Step  4,  find  E^V^Cx'li) 

Step  5,  combine  results  from  Steps  2 and  4. 

Illustration  3.15.  This  is  a simple  illustration,  selected  because 
we  know  what  the  answer  is  from  previous  discussion  and  a linear  combina- 
tion of  random  variables  is  not  involved.  Suppose  x'  in  Theorem  3.9  is 
simply  the  random  variable  X where  X.has  an  equal  probability  of  being 


any  one  of  the  X. . values  in  the  set  of  N = EN.  . We  know  that  the 
ij  . i 

J i 

variance  of  X can  be  expressed  as  follows: 

1 m±  - 2 
V(3°  "I  « (Xij"X..} 

ij 


(3.39) 


In  the  case  of  two-stage  sampling  an  equivalent  method  of  selecting  a 
value  of  X is  to  select  a psu  first  and  then  select  an  element  within  the 
psu,  the  condition  being  that  P(ij)  = P(i)P(j|i)  = . This  condition  is 

l^i  ^ 

satisfied  by  letting  P(i)  = — and  P(j|i)  = — . We  now  want  to  find 

V(X)  by  using  Theorem  3.9  and  check  the  result  with  Equation  (3.39). 

Step  1.  From  the  random  selection  specifications  we  know  that 
E^(x^|i)  = Xi#  . Therefore, 

Step  2.  Vox'll)  = V1(X.  ) 

We  know  that  X^ . is  a random  variable  that  has  a probability  of  — of  being 
equal  to  the  ith  value  in  the  set  X^ , . . . , X^  . Therefore,  by  definition 
of  the  variance  of  a random  variable, 

M N, 


where 


v^x'iD  - s ir  c^.-x..)2 

1 M 

M N.  JXi- 

*.  

1 


(3.40) 


N 
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Step  3. 

By  definition 

Step  4. 

V2(x'|i)  -X..)2 

J 1 

N. 

Since  each  value  of  V^(x^|i)  has  a probability  — 

M N.  N 

E1V2(x'|i)  = E ~ -X..)  (3.41) 

i j i 

Step  5. 

From  Equations  (3.40)  and  (3.41)  we  obtain 

M M N. 

V(x')  = ~ [ZN  (X ,-X..)2  + Z Z1  (X. .-X.,)2]  (3.42) 

i 1 i 1 11  1- 

The  fact  that  Equations  (3.42)  and  (3.39)  are  the  same  is  verified 
by  Equation  (1.10)  in  Chapter  I. 

Illustration  3.16.  Find  the  variance  of  the  estimator  x'  given  by 
Equation  (3.32)  assuming  simple  random  sampling  at  both  stages  of  sampling. 


Step  1. 

Theorem  3.7  is  applicable.  That  is, 
mn . *f  N . 

E2(x'|i)  = El1  E [£  -X:  x |i] 

1J  1 

which  means 

"sum  the  conditional  expected  values  of  each  of  the  n terms 

in  Equation 

(3. 33) . " 

With  regard  to  any  one  of  the  terms  in  Equation  (3.33),  the 


conditional 

expectation  is 

M N.  N . M N.  M X. 

~ rM  1 1-1  M l „ , 1 M l rr  Ml. 

E_  [—  — x . .1  ] = — — E0 (x. . l)  = — — X.  = 

2mn.  it1  mn.  2 11 1 mn.  l • mn. 

l J l J l i 

Therefore 

mn  * M X * 

E,(x'|  i)  = EE1  - — (3.43) 

2 . . m n . 

ij  i 

With  reference  to  Equation  (3.43)  and  summing  with  respect  to  j , we  have 
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n.  X. 

E1  - — = - X 
m n . m i • 

J i 

Hence  Equation  (3.43)  becomes 


E2(x"I  i) 

= — z 

m 

l 

X. 

l • 

m 

EX. 

. l 

is  simple  because  

m 

Step 

2. 

Find  V1E2 

This 

in 

Equation 

(3.44)  is 

the 

mean  of  a 

random 

sample 

of  m from  the  set  of 

psu 

totals 

X^#,...,  X^  . Therefore, 


V1E2(x1i).M2(^)  m 


bl 


(3.44) 


(3.45) 


where 

M - 2 
E(X.  -X.) 

°bl  = 1 M and  *• 

2 

In  the  subscript  to  a , the  ”bM  indicates  between  psu  variance  and  "1" 
distinguishes  this  variance  from  between  psu  variances  in  later  illustra- 
tions. 


Step  3.  Finding  V2(x'| i) , is  more  involved  because  the  conditional 
variance  of  a linear  combination  of  random  variables  must  be  derived. 
However,  this  is  analogous  to  using  Theorem  3.5  for  finding  the  variance 
of  a linear  combination  of  random  variables.  Theorem  3.5  applies  except 
that  V(u|i)  replaces  V(u)  and  conditional  variance  and  conditional  co- 
variance  replace  the  variances  and  covariances  in  the  formula  for  V(u). 

As  the  solution  proceeds,  notice  that  the  strategy  is  to  shape  the  problem 
so  previous  results  can  be  used. 

Look  at  the  estimator  x' , Equation  (3.33),  and  determine  whether  any 
covariances  exist.  An  element  selected  from  one  psu  is  independent  of  an 
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element  selected  from  another;  but  within  a psu  the  situation  is  the  same 
as  the  one  we  had  when  finding  the  variance  of  the  mean  of  a simple  random 
sample.  This  suggests  writing  x"*  in  terms  of  because  the  x^  ’ s are 
independent.  Accordingly,  we  will  start  with 


M 


m 


x'  = - E N.x. 
m ii< 

1 


Hence 


• M 


m 


V (x'  x)  = V { [rj-  Z N.x.  ] i} 
L L m ^ 1 1 • 


Since  the  x^  ’ s are  independent 

w2  m 


V2(x"|i)  = ZV2(N.x.Ji) 
m i 


and  since  is  constant  with  regard  to  the  conditional  variance 

m ty 

V2(x'|i)  = ~ Z V2(x.  |i) 
m i 


(3.46) 


Since  the  sampling  within  each  psu  is  simple  random  sampling 

V 2 

N.-n.  a. 

V*i.|i)= 


(3.47) 


where 


N 

2 1 1 2 

°i  ■ z I.  <VXi-> 

j i J 


Step  4.  After  substituting  the  value  of  V2(x^Ji)  in  Equation  (3.46), 
and  then  applying  Theorem  3.3,  we  have 


m2  m ? N.-n.  a. 

Wx'lD  --2  ? EifNi  571 

mi  li 


Since  the  first  stage  of  sampling  was  simple  random  sampling  and  each  psu 
had  an  equal  chance  of  being  in  the  sample, 
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Hence 


VNi 


Nrni 

Ni-i 


2 

!i]  - I 

n.J  M 


I N 


M 2 

0 N.-n.  a. 

2 _i l __i 

1 N.-l  n. 

l l 


v2<*ii)-  = 


E N 


2 Ni-ni 
i N.-l 

l 


(3.48) 


Step  5.  Combining  Equation  (3.48)  and  Equation  (3.45)  the  answer  is 

2 2 

o xf  aui  M o N.-n.  a. 

V(x')  = M2  + - 2 N?  1 1 — (3.49) 

M-l  m m . l N.-l  n. 

l i l 


Illustration  3.17.  The  sampling  specifications  are:  (1)  at  the  first 

Ni 

stage  select  m psu’s  with  replacement  and  probability  P(i)  = — , and  (2) 
at  the  second  stage  a simple  random  sample  of  n elements  is  to  be  selected 
from  each  of  the  m psu's  selected  at  the  first  stage.  This  will  give  a sam- 
ple of  n = mn  elements.  Find  the  variance  of  the  sample  estimate  of  the 
population  total. 

The  estimator  needs  to  be  changed  because  the  psu’s  are  not  selected 
with  equal  probability.  Sample  values  need  to  be  weighted  by  the  recip- 
rocals of  their  probabilities  of  selection  if  the  estimator  is  to  be 
unbiased.  Let 

P^(ij)  be  the  probability  of  element  ij  being  in  the  sample, 

P'(i)  be  the  relative  frequency  of  the  i^  psu  being  in  a sample 
of  m,  and  let 

P*(j|i)  equal  the  conditional  probability  of  element  ij  being  in 

th 

the  sample  given  that  the  l psu  is  already  in  the  sample. 

Then 

P'(ij)  = P'(i)P'(j|i) 

N. 

According  to  the  sampling  specifications  P'(i)  = m — . This  prob- 
ability was  described  as  relative  frequency  because  "probability  of  being 
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in  a sample  of  m psu’s"  is  subject  to  misinterpretation.  The  i*^1  psu 
can  appear  in  a sample  more  than  once  and  it  is  counted  even/  time  it 
appears.  That  is,  if  the  i^  psu  is  selected  more  than  once,  a sample  of 
n is  selected  within  the  i^  psu  every  time  that  it  is  selected.  By 
substitution 

P'(ij)  - (3.50) 

i 

Equation  (3.50)  means  that  every  element  has  an  equal  probability  of  being 
in  the  sample.  Consequently,  the  estimator  is  very  simple, 


, N ™ 

x = — Zlx.  . 

. . 11 
mn  il 


(3.51) 


Exercise  3.18.  Show  that  x^,  Equation  (3.51),  is  an  unbiased  estimator 
of  the  poDulation  total. 

In  finding  V(x')  our  first  step  was  to  solve  for  E2(x"|i). 

Step  1.  By  definition 

„ .mn 


E2(x'|i)  = E2([— - EZx  ]|i} 
mn  ij  J 


Since  i is  constant  with  regard  to  E2 , 


E2(x'|i)  = ~ EE  E2(x  |i) 
mn  ij 

Proceeding  from  Equation  (3.52)  to  the  following  result  is  left  as  an 
exercise : 


(3.52) 


V*'!”  = m fi 


(3.53) 


Step  2.  From  Equation  (3.53)  we  have 


N m 

v1e2(x'U)  = Vm  ix4.) 

1 


Ill 


Since  the  ' s are  independent 

.,2  m 

Vox'll)  = ~ £ V1(Xi#) 
mi 

Because  the  first  stage  of  sampling  is  sampling  with  probability  propor- 
tional to  and  with  replacement, 


M N . _ 

v (x  )=  I,  — (X  -X  j 
V' l-'  : N K i-  • 

1 


(3.54) 


Let 


W - °b2 


Then 


w2  2 


? 

N“  2 


V1E2(x'I1)  = % (mab2)  = n 0 


b2 


(3.55) 


Exercise  3. 19.  Prove  that  E(X^,  ) = m which  shows  that  it  is 
appropriate  to  use  X ^ in  Equation  (3.54). 

Step  3.  To  find  V2(x"|i),  first  write  the  estimator  as 


* *■  ' r-» 

x = — Z x . 
m . i 

l 


(3.56) 


Then,  since  the  x_^  ’ s are  independent 

,t2  m 

V2(x'|i)  = — £■  l V2^xi.^ 
m i 


and 


N.-n  a . 

V2(xi.)  = N."-l  ~ 

l n 


where 


N 

2 i 1 ? 

°i  ■ E n.  «1rxi.) 

1 i 
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Therefore 


Step  4. 


,,2  m N.-n  a 7 
V <*-|  1)  =\  I ± 

mil  n 


m2  l m N.-n  9 

E1V2(x'|i)  r I 

m n i i 

N. 


Since  the  probability  of  V^Cx^li)  is 


m nil 


z m M N,  N.-n  9 

Elv2(x'|i)  -s_  I I 


which  becomes 


2 M N.  N.-n  , 
ElV2(x'|i)  --  2 - 

mn  l l 


(3.57) 


Step  5.  Combining  Equation  (3.55)  and  Equation  (3.57)  we  have  the 
answer 

2 

9 aK9  i N.  N.-n  9 

v(x'>  - N hr  + - 1 r (Nrr)0i] 

mn  l 


(3.58) 
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CHAPTER  IV.  THE  DISTRIBUTION  OF  AN  ESTIMATE 

4.1  PROPERTIES  OF  SIMPLE  RANDOM  SAMPLES 

The  distribution  of  an  estimate  is  a primary  basis  for  judging  the 
accuracy  of  an  estimate  from  a sample  survey.  But  an  estimate  is  only 
one  number.  How  can  one  number  have  a distribution?  Actually,  "distri- 
bution of  an  estimate"  is  a phrase  that  refers  to  the  distribution  of 
all  possible  estimates  that  might  occur  under  repetition  of  a prescribed 
sampling  plan  and  estimator  (method  of  estimation).  Thanks  to  theory 
and  empirical  testing  of  the  theory,  it  is  not  necessary  to  generate 
physically  the  distribution  of  an  estimate  by  selecting  numerous  samples 
and^  making  an  estimate  from  each.  However,  to  have  a tangible  distribu- 
tion of  an  estimate  as  a basis  for  discussion,  an  illustration  has  been 
prepared. 

Illustration  4.1.  Consider  simple  random  samples  of  4 from  an 
assumed  population  of  8 elements.  There  are  = 4T£T  = possible 

samples.  In  Table  4.1,  the  sample  values  for  all  of  the  70  possible  sam- 
ples of  four  are  shown.  The  70  samples  were  first  listed  in  an  orderly 
manner  to  facilitate  getting  all  of  them  accurately  recorded.  The  mean, 
x,  for  each  sample  was  computed  and  the  samples  were  then  arrayed 
according  to  the  value  of  x for  purposes  of  presentation  in  Table  4.1. 

The  distribution  of  x is  the  70  values  of  x shown  in  Table  4.1,  including 
the  fact  that  each  of  the  70  values  of  x has  an  equal  probability  of  being 
the  estimate.  These  70  values  have  been  arranged  as  a frequency  distribu- 
tion in  Table  4.2. 

As  discussed  previously,  one  of  the  properties  of  simple  random 
sampling  is  that  the  sample  average  is  an  unbiased  estimate  of  the  popu- 
lation average;  that  is,  E(x)  = X.  This  means  that  the  distribution  of 
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Table  4.1 — Samples  of  four  elements  from  a population  of  eight  1/ 


Sample 

number 

Values  of 

X. 

l 

X 

2 

s 

Sample 

number 

Values  of 
x. 

l 

X 

2 

s 

lc 

2, 1,6, 4 

3.25 

4.917 

36s 

1,6, 8, 9 

6.00 

12.667 

2 

2, 1,4, 7 

3.50 

7.000 

37s 

1,4,8,11 

6.00 

19.333 

3 

2, 1,4, 8 

3.75 

9.583 

38s 

2, 6, 8, 9 

6.25 

9.583 

4 

2, 1,6, 7 

4.00 

8.667 

39s 

2,4,8,11 

6.25 

16.250 

5 

2, 1,4, 9 

4.00 

12.667 

40s 

1,6,7,11 

6.25 

16.917 

6 

2, 1,6, 8 

4.25 

10.917 

41s 

1,4,11,9 

6.25 

20.917 

7 

2, 1,6, 9 

4.50 

13.667 

42 

1,7, 8,9 

6.25 

12.917 

8 

2,1,4,11 

4.50 

20.333 

43cs 

6, 4, 7, 8 

6.25 

2.917 

9cs 

2, 1,7, 8 

4.50 

12.333 

44s 

2,6,7,11 

6.50 

13.667 

10 

1,6, 4, 7 

4.50 

7.000 

45s 

2,4,11,9 

6.50 

17.667 

11s 

2, 1.7, 9 

4.75 

14.917 

46 

2, 7, 8, 9 

6.50 

9.667 

12 

2, 6, 4, 7 

4.75 

4.917 

47s 

1,6,8,11 

6.50 

17.667 

13 

1,6, 4, 8 

4.75 

8.917 

48s 

6, 4, 7, 9 

6.50 

4.333 

14 

2,1,6,11 

5.00 

20.667 

49s 

2,6,8,11 

6.75 

14.250 

15s 

2, 1,8, 9 

5.00 

16.667 

50s 

1,6,11,9 

6.75 

18.917 

16 

2, 6, 4, 8 

5.00 

6.667 

51 

1,7,8,11 

6.75 

17.583 

17 

1,6, 4, 9 

5.00 

11.337 

52s 

6, 4, 8,9 

6.75 

4.917 

18s 

1,4, 7, 8 

5.00 

10.000 

53s 

2,6,11,9 

7.00 

15.333 

19s 

2,1,7,11 

5.25 

21.583 

54 

2,7,8,11 

7.00 

14.000 

20 

2, 6, 4, 9 

5.25 

8.917 

55 

1,7,11,9 

7.00 

18.667 

21s 

2, 4, 7, 8 

5.25 

7.583 

56s 

6,4,7,11 

7.00 

8.667 

22s 

1,4, 7, 9 

5.25 

12.250 

57 

4, 7, 8, 9 

7.00 

4.667 

23s 

2,1,8,11 

5.50 

23.000 

58 

2,7,11,9 

7.25 

14.917 

24s 

2, 4, 7, 9 

5.50 

9.667 

59 

1,8,11,9 

7.25 

18.917 

25 

1,6,4,11 

5.50 

17.667 

60s 

6,4,8,11 

7.25 

8.917 

26s 

1,6, 7, 8 

5.50 

9.667 

61 

2,8,11,9 

7.50 

15.000 

27s 

1,4, 8, 9 

5.50 

13.667 

62cs 

6,4,11,9 

7.50 

9.667 

28cs 

2,1,11,9 

5.75 

24.917 

63 

6, 7, 8,9 

7.50 

1.667 

29 

2,6,4,11 

5.75 

14.917 

64 

4,7,8,11 

7.50 

8.333 

30s 

2, 6, 7, 8 

5.75 

6.917 

65 

4,7,11,9 

7.75 

8.917 

31s 

2, 4, 8, 9 

5.75 

10.917 

66 

6,7,8,11 

8.00 

4.667 

32s 

1,6, 7, 9 

5.75 

11.583 

67 

4,8,11,9 

8.00 

8.667 

33s 

1,4,7,11 

5.75 

18.250 

68 

6,7,11,9 

8.25 

4.917 

34s 

2, 6, 7, 9 

6.00 

8.667 

69 

6,8,11,9 

8.50 

4.333 

35s 

2,4,7,11 

6.00 

15.333 

70c 

7,8,11,9 

8.75 

2.917 

_1/  Values  of  X for  the  population  of  eight  elements  are  X = 2,  X = 1, 

X = 6,  X = 4,  X = 7,  X = 8,  X = 11,  X = 9;  X = 6.00;  and 
J 4 b o / o 

o Z(X.-X) 
s - = 12- 
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Table  4.2 — Sampling  distribution  of  x 


X 

Relative  frequency 

of  X 

Simple  random 
sampling 
Illustration  4.1 

_ . :Stratified  random 

Cluster  sampling  - . 

Illustration  4.2  ' S^mp“nR  . , 

:Illustration  4.2 

3.25 

1 

1 

3.50 

1 

3.75 

1 

4.00 

2 

4.25 

1 

4.50 

4 

1 

1 

4.75 

3 

1 

5.00 

5 

2 

5.25 

4 

3 

5.50 

5 

4 

5.75 

6 

1 

5 

6.00 

4 

4 

6.25 

6 

1 

5 

6.50 

5 

4 

6.75 

4 

3 

7.00 

5 

2 

7.25 

3 

1 

7.50 

4 

1 

1 

7.75 

1 

8.00 

2 

8.25 

1 

8.50 

1 

8.75 

1 

- 

1 

Total 

70 

6 

36 

Expected  value 

of  X 

6. 

,00 

6.00 

6.00 

Variance  of  x 

1. 

,50 

3.29 

0.49 
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x is  centered  on  X.  If  the  theory  is  correct,  the  average  of  x for  the 
70  samples,  which  are  equally  likely  to  occur,  should  be  equal  to  the 
population  average,  6.00.  The  average  of  the  70  samples  does  equal  6.00. 

From  the  theory  of  expected  values,  w'e  also  know  that  the  variance 
of  x is  given  by 

x N n 


where 


I(X.-X)2 
i 1 
N-l 


2 2 

With  reference  to  Illustration  4.1  and  Table  4.1,  S = 12.00  and  S-  = 

* x 

-p  = 1.5  . The  formula  (4.1)  can  be  verified  by  computing  the 
variance  among  the  70  values  of  x as  follows: 


(3. 25-6. 00)2  + (3.50-6.00)2  +. . .+  (8. 75-6.00) 2 

70 


1.5 


Since  S is  a population  parameter,  it  is  usually  unknown.  Fortu- 

2 2 

nately,  as  discussed  in  Chapter  3,  E(s  ) = S where 


n - 2 
I(x  -x) 

2 _ i 

' " n-l 


2 

In  Table  4.1,  the  value  of  s is  shown  for  each  of  the  70  samples.  The 

2 2 2 2 
average  of  the  70  values  of  s is  equal  to  S . The  fact  that  E(s  ) *=  S 

2 

is  another  important  property  of  simple  random  samples.  In  practice  s is 

2 

used  as  an  estimate  of  S . That  is, 

2 N-n  s2 

Sx  N n 

is  an  unbiased  estimate  of  the  variance  of  x. 

To  recapitulate,  we  have  just  verified  three  important  properties  of 


simple  random  samples : 
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(1) 

E(x)  = X 

(2) 

S-  -/E n 
x V N 

s 

(3) 

E(s2)  = S2 

The  standard  error  of  x,  namely  S-  , is  a measure  of  how  much  x varies 
under  repeated  sampling  from  X.  Incidentally,  notice  that  Equation  (4.1) 
shows  how  the  variance  of  x is  related  to  the  size  of  the  sample.  Now 
we  need  to  consider  the  form  or  shape  of  the  distribution  of  x. 

Definition  4.1.  The  distribution  of  an  estimate  is  often  called  the 
sampling  distribution.  It  refers  to  the  distribution  of  all  possible 
values  of  an  estimate  that  could  occur  under  a prescribed  sampling  plan. 
4.2  SHAPE  OF  THE  SAMPLING  DISTRIBUTION 

For  random  sampling  there  is  a large  volume  of  literature  on  the 
distribution  of  an  estimate  which  we  will  not  attempt  to  review.  In 
practice,  the  distribution  is  generally  accepted  as  being  normal  (See 
Figure  4.1)  unless  the  sample  size  is  "small."  The  theory  and  empirical 
tests  show  that  the  distribution  of  an  estimate  approaches  the  normal 
distribution  rapidly  as  the  size  of  the  sample  increases.  The  closeness 
of  the  distribution  of  an  estimate  to  the  normal  distribution  depends  on: 
(1)  the  distribution  of  X (i.e. , the  shape  of  the  frequency  distribution 
of  the  values  of  X in  the  population  being  sampled) , (2)  the  form  of  the 
estimator,  (3)  the  sample  design,  and  (4)  the  sample  size.  It  is  not 
possible  to  give  a few  simple,  exact  guidelines  for  deciding  when  the 
degree  of  approximation  is  good  enough.  In  practice,  it  is  generally  a 
matter  of  working  as  though  the  distribution  of  an  estimate  is  normal  but 
being  mindful  of  the  possibility  that  the  distribution  might  differ 
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considerably  from  normal  when  the  sample  is  very  small  and  the  population 
distribution  is  highly  skewed.  3/ 

It  is  very  fortunate  that  the  sampling  distribution  is  approximately 

normal  as  it  gives  a basis  for  probability  statements  about  the  precision 

of  an  estimate.  As  notation, x'  will  be  the  general  expression  for  any 

estimate,  and  a ^ is  the  standard  error  of  x'. 
x 

Figure  4.1  is  a graphical  representation  of  the  sampling  distribution 
of  an  estimate.  It  is  the  normal  distribution.  In  the  mathematical 
equation  for  the  normal  distribution  of  a variable  there  are  two  parameters: 
the  average  value  of  the  variable,  and  the  standard  error  of  the  variable. 


_3/  For  a good  discussion  of  the  distribution  of  a sample  estimate,  see 
Vol.  I,  Chapter  1,  Hansen,  Hurwitz,  and  Madow.  Sample  Survey  Methods  and 
Theory,  John  Wiley  and  Sons,  1953. 
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Suppose  x " is  an  estimate  from  a probability  sample.  The  characteristics 
of  the  sampling  distribution  of  x'  are  specified  by  three  things:  (1)  the 

expected  value  of  x' , E(x^)  , which  is  the  mean  of  the  distribution;  (2)  the 
standard  error  of  x^,  ax>,  and  (3)  the  assumption  that  the  distribution  is 
normal.  If  x " is  normally  distributed,  two-thirds  of  the  values  that  x' 
could  equal  are  between  [E(x^)  - a^]  and  [E(x")  + > 95  percent  of  the 

possible  values  of  x'  are  between  [E(x')  - and  [E(xO  + 2ax^],  and 

99.7  percent  of  the  estimates  are  within  3a^  from  E(x'). 

Exercise  4.1.  With  reference  to  Illustration  4.1,  find  E(x)  - o-  and 
E(x)  + a-  . Refer  to  Table  4.2  and  find  the  proportion  of  the  70  values 

of  x that  are  between  E(x)  - a-  and  E(x)  + a-  . How  does  this  compare  with 

the  expected  proportion  assuming  the  sampling  distribution  of  x is  normal? 
The  normal  approximation  is  not  expected  to  be  close,  owing  to  the  small 
size  of  the  population  and  of  the  sample.  Also  compute  E(x)  - 2a-  and 
E(x)  + 2a-  and  find  the  proportion  of  the  70  values  of  x that  are  between 
these  two  limits. 

4.3  SAMPLE  DESIGN 

There  are  many  methods  of  designing  and  selecting  samples  and  of  making 

estimates  from  samples.  Each  sampling  method  and  estimator  has  a sampling 

distribution.  Since  the  sampling  distribution  is  assumed  to  be  normal, 

2 

alternative  methods  are  compared  in  terms  of  E(x')  and  a ^ (or  ax-)* 

For  simple  random  sampling,  we  have  seen,  for  a sample  of  n,  that 

every  possible  combination  of  ri  elements  has  an  equal  chance  of  being  the 
sample  selected.  Some  of  these  possible  combinations  (samples)  are  much 
better  than  others.  It  is  possible  to  introduce  restrictions  in  sampling 
so  some  of  the  combinations  cannot  occur  or  so  some  combinations  have  a 
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higher  probability  of  occurrence  than  others.  This  can  be  done  without 
introducing  bias  in  the  extimate  x'  and  without  losing  a basis  for  esti- 
mating a^.  Discussion  of  particular  sample  designs  is  not  a primary 
purpose  of  this  chapter.  However,  a few  simple  illustrations  will  be 
used  to  introduce  the  subject  of  design  and  to  help  develop  concepts  of 
sampling  variation. 

Illustration  4.2.  Suppose  the  population  of  8 elements  used  in 
Table  4.1  is  arranged  so  it  consists  of  four  sampling  units  as  follows: 


Sampling  Unit 

Elements 

Values 

of  X 

Sample  Unit  Total 

1 

1,2 

xi  = 

2, 

X2  - 

1 

3 

2 

3,4 

X3 

6, 

X4“ 

4 

10 

3 

5,6 

X5 

7, 

X6 

8 

15 

4 

7,8 

X7  ■ 

11 

• X8 

• 9 

20 

For  sampling  purposes  the  population  now  consists  of  four  sampling 
units  rather  than  eight  elements.  If  we  select  a simple  random  sample  of 
two  sampling  units  from  the  population  of  four  sampling  units,  it  is  clear 
that  the  sampling  theory  for  simple  random  sampling  applies.  This  illus- 
tration points  out  the  importance  of  making  a clear  distinction  between  a 
sampling  unit  and  an  element  that  a measurement  pertains  to.  A sampling 
unit  corresponds  to  a random  selection  and  it  is  the  variation  among  sam- 
pling units  (random  selections)  that  determines  the  sampling  error  of  an 
estimate.  When  the  sampling  units  are  composed  of  more  than  one  element, 
the  sampling  is  commonly  referred  to  as  cluster  sampling  because  the  ele- 
ments in  a sampling  unit  are  usually  close  together  geographically. 
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For  a simple  random  sample  of  2 sampling  units,  the  variance  of  x , 

c 

where  xc  is  the  sample  average  per  sampling  unit,  is 

S-  13.17 

x N n 
c 


where 


N 


4,  n = 2,  and  S 


2 (3-12) 2 + (10-12) 2 + (15-12)2  + (20-12) 2 _ 158 


Instead  of  the  average  per  sampling  unit  one  will  probably  be  interested 

x 

in  the  average  per  element,  which  is  x * y-  , since  there  are  two  elements 

in  each  sampling  unit.  The  variance  of  x is  one-fourth  of  the  variance 

13.17 


of  x . Hence,  the  variance  of  x is 
c 


3.29. 


There  are  only  six  possible  random  samples  as  follows 


Sample 

Sampling  Units 

Sample  average  per 
sampling  unit,  xc 

2 

s 

c 

1 

1.2 

6.5 

24.5 

2 

1.3 

9.0 

72.0 

3 

1.4 

11.5 

144.5 

4 

2,3 

12.5 

12.5 

5 

2,4 

15.0 

50.0 

6 

3,4 

17.5 

12.5 

- ,2 


where  s' 


E(x  -x  ) 
i 1 c 
n-1 


and  x.^  is  a sampling  unit  total.  Be  sure  to  notice 


2 2 

that  s (which  is  the  sample  estimate  of  S ) is  the  variance  among  sampling 
c c 

units  in  the  sample,  not  the  variance  among  individual  elements  in  the 

sample.  From  the  list  of  six  samples,  it  is  easy  to  verify  that  xc  is  an 

2 

unbiased  estimate  of  the  population  average  per  sampling  unit  and  that  sc 

158 

is  an  unbiased  estimate  of  — j-  , the  variance  among  the  four  sampling 
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units  in  the  population.  Also,  the  variance  among  the  six  values  of  x is 
13.17  which  agrees  with  the  formula. 

The  six  possible  cluster  samples  are  among  the  70  samples  listed  in 
Table  4.1.  Their  sample  numbers  in  Table  4.1  are  1,  9,  28,  43,  62,  and 
70.  A "c"  follows  these  sample  numbers.  The  sampling  distribution  for 

the  six  samples  is  shown  in  Table  4.2  for  comparison  with  simple  random 

sampling.  It  is  clear  from  inspection  that  random  selection  from  these 
six  is  less  desirable  than  random  selection  from  the  70.  For  example, 
one  of  the  two  extreme  averages,  3.25  or  8.75,  has  a probability  of  y of 
occurring  for  the  cluster  sampling  and  a probability  of  only  — when 
selecting  a simple  random  sample  of  four  elements.  In  this  illustration, 
the  sampling  restriction  (clustering  of  elements)  increased  the  sampling 
variance  from  1.5  to  3.29. 

It  is  of  importance  to  note  that  the  average  variance  among  elements 

within  the  four  clusters  is  only  1.25.  (Students  should  compute  the  within 

cluster  variances  and  verify  1.25).  This  is  much  less  than  12.00,  the 
variance  among  the  8 elements  of  the  population.  In  rfealitv,  the  variance 
among  elements  within  clusters  is  usually  less  than  the  variance  among  all 
elements  in  the  population,  because  clusters  (sampling  units)  are  usually 
composed  of  elements  that  are  close  together  and  elements  that  are  close 
together  usually  show  a tendency  to  be  alike. 

Exercise  4.2.  In  Illustration  4.2,  if  the  average  variance  among 
elements  within  clusters  had  been  greater  than  12.00,  the  sampling  variance 
for  cluster  sampling  would  have  been  less  than  the  sampling  variance  for  a 
simple  random  sample  of  elements.  Repeat  what  was  done  in  Illustration  4.2 
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using  as  sampling  units  elements  1 and  6,  2 and  5,  3 and  8,  and  4 and  7. 
Study  the  results. 

Illustration  4.3.  Perhaps  the  most  common  method  of  sampling  is  to 
assign  sampling  units  of  a population  to  groups  called  strata.  A simple 
random  sample  is  then  selected  from  each  stratum.  SuDpose  the  population 
used  in  Illustration  4.1  is  divided  into  two  strata  as  follows: 

Stratum  1 Xx  = 2 , X2  = 1 , X3  = 6 , X4  = 4 

Stratum  2 X = 7,  X = 8,  X = 11,  X0  = 9 

do/  o 

The  sampling  plan  is  to  select  a simple  random  sample  of  two  elements 
from  each  stratum.  There  are  36  possible  samples  of  4,  two  from  each 
stratum.  These  36  samples  are  identified  in  Table  4.1  by  an  s after  the 
sample  number  so  you  may  compare  the  36  possible  stratified  random  samples 
with  the  70  simple  random  samples  and  with  the  six  cluster  samples.  Also, 
see  Table  4.2. 

Consider  the  variance  of  x.  We  can  write 


where  x^  is  the  sample  average  for  stratum  1 and  is  the  average  for 
stratum  2.  According  to  Theorem  3.5 


2 12  2 

S-  = (f-)(S-  + S-  + 2S-  - 

x 4 x x^  xix2 


We  know  the  covariance,  S-  - , is  zero  because  the  sampling  from  one 

X1X2 

stratum  is  independent  of  the  sampling  from  the  other  stratum.  And, 
since  the  sample  within  each  stratum  is  a simple  random  sample. 


Nrni 


where 


"1«li-Xl.)2 

^r- 
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2 2 

The  subscript  "1"  refers  to  stratum  1.  S-  is  of  the  same  form  as  . 
Therefore , 


Since 


The  variance,  0.49,  is  comparable  to  1.5  in  Illustration  4.1  and  to  3.29  in 
Illustration  4.2. 

In  Illustration  4.2,  the  sampling  units  were  groups  of  two  elements  and 
the  variance  among  these  groups  (sampling  units)  appeared  in  the  formula 
for  the  variance  of  x.  In  Illustration  4.3,  each  element  was  a sampling 
unit  but  the  selection  process  (randomization)  was  restricted  to  taking 
one  stratum  (subset)  at  a time,  so  the  sampling  variance  was  determined  by 
variability  within  strata.  As  you  study  sampling  plans,  form  mental  pictures 

of  the  variation  which  the  sampling  error  depends  on.  With  experience  and 

v 

accumulated  knowledge  of  what  the  patterns  of  variation  in  various  popula- 
tions are  like,  one  can  become  expert  in  judging  the  efficiency  of  alterna- 
tive sampling  plans  in  relation  to  specific  objectives  of  a survey. 

If  the  population  and  the  samples  in  the  above  illustrations  had  been 
larger,  the  distributions  in  Table  4.2  would  have  been  approximately  nor- 
mal. Thus,  since  the  form  of  the  distribution  of  an  estimate  from  a prob- 
ability sample  survey  is  accepted  as  being  normal,  only  two  attributes  of 
an  estimate  need  to  be  evaluated,  namely  its  expected  value  and  its 


2 _ 1.  Nl~nl  fl  N2  n2  ff, 

x 4 1 Nx  N2  n2J 


Vnl  N2~°2  1 

N,  N.  2 ’ and  "l  n2  2’ 


2 2 

2 = 1 , . 1 [4,?2+2..92  = 

x 8 1 2 J 8 1 2 J 


variance. 
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In  the  above  illustrations  ideal  conditions  were  implicitly  assumed. 
Such  conditions  do  not  exist  in  the  real  world  so  the  theory  must  be 
extended  to  fit,  more  exactly,  actual  conditions.  There  are  numerous 
sources  of  error  or  variation  to  be  evaluated.  The  nature  of  the  rela- 
tionship between  theory  and  practice  is  a major  governing  factor  deter- 
mining the  rate  of  progress  toward  improvement  of  the  accuracy  of  survey 
results. 

We  will  now  extend  error  concepts  toward  more  practical  settings. 

4.4  RESPONSE  ERROR 

So  far,  we  have  discussed  sampling  under  implicit  assumptions  that 
measurements  are  obtained  from  all  n elements  in  a sample  and  that  the 
measurement  for  each  element  is  without  error.  Neither  assumption  fits, 
exactly,  the  real  world.  In  addition,  there  are  "coverage"  errors  of 
various  kinds.  For  example,  for  a farm  survey  a farm  is  defined  but 
application  of  the  definition  involves  some  degree  of  ambiguity  about 
whether  particular  enterprises  satisfy  the  definition.  Also,  two  persons 
might  have  an  interest  in  the  same  farm  tract  giving  rise  to  the  possibility 
that  the  tract  might  be  counted  twice  (included  as  a part  of  two  farms)  or 
omitted  entirely. 

Partly  to  emphasize  that  error  in  an  estimate  is  more  than  a matter 
of  sampling,  statisticians  often  classify  the  numerous  sources  of  error 
into  one  of  two  general  classes:  (1)  Sampling  errors  which  are  errors 

associated  with  the  fact  that  one  has  measurements  for  a sample  of  elements 
rather  than  measurements  for  all  elements  in  the  population,  and  (2)  non- 
sampling errors — errors  that  occur  whether  sampling  is  involved  or  not. 
Mathematical  error  models  can  be  very  complex  when  they  include  a term  for 
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each  of  many  sources  of  error  and  attempt  to  represent  exactly  the  real 
world.  However,  complicated  error  models  are  not  always  necessary, 
depending  upon  the  purposes. 

For  purposes  of  discussion,  two  oversimplified  response-error  models 
will  be  used.  This  will  introduce  the  subject  of  response  error  and  give 
some  clues  regarding  the  nature  of  the  impact  of  response  error  on  the 
distribution  of  an  estimate.  For  simplicity,  we  will  assume  that  a 
measurement  is  obtained  for  each  element  in  a random  sample  and  that  no 
ambiguity  exists  regarding  the  identity  or  definition  of  an  element.  Thus, 
we  will  be  considering  sampling  error  and  response  error  simultaneously. 

Illustration  4.4.  Let  T^,...,TN  be  the  "true  values"  of  some  variable 
for  the  N elements  of  a population.  The  mention  of  true  values  raises 
numerous  questions  about  what  is  a true  value.  For  example,  what  is  your 
true  weight?  How  would  you  define  the  true  weight  of  an  individual?  We 
will  refrain  from  discussing  the  problem  of  defining  true  values  and  simply 
assume  that  true  values  do  exist  according  to  some  practical  definition. 
When  an  attempt  is  made  to  ascertain  T^,  some  value  other  than  T^  might 
be  obtained.  Call  the  actual  value  obtained  X^.  The  difference,  e^  = 

- T^,  is  the  response  error  for  the  i ^ element.  If  the  characteristic, 
for  example,  is  a person’s  weight,  the  observed  weight,  X^ , for  the  i^ 
individual  depends  upon  when  and  how  the  measurement  is  taken.  However, 
for  simplicity,  assume  that  X^  is  always  the  value  obtained  regardless  of 
the  conditions  under  which  the  measurement  is  taken.  In  other  words, 
assume  that  the  response  error,  e^,  is  constant  for  the  i^  element.  In 
this  hypothetical  case,  we  are  actually  sampling  a population  set  of  values 
X^j.-.jXj^  instead  of  a set  of  true  values  T^,...,^. 
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Under  the  conditions  as  stated,  the  sampling  theory  applies  exactly 
to  the  set  of  population  values  X^,...,X^.  If  a simple  random  sample  of 
elements  is  selected  and  measurements  for  all  elements  in  the  sample  are 

N 

- - - f* 

obtained,  then  E(x)  = X.  That  is,  if  the  purpose  is  to  estimate  T = , 

the  estimate  is  biased  unless  T happens  to  be  equal  to  X.  The  bias  is 
X - T which  is  appropriately  called  "response  bias." 


Rewrite  e 


T^  as  follows 


xi  ■ Ti  + ei 

Then,  the  mean  of  a simple  random  sample  may  be  expressed  as 


(4.2) 


2^  Z(t1+ei) 


n n 

or,  as  x * t + e . 

From  the  theory  of  expected  values,  we  have 
E(x)  = E(t)  + E(e) 

Since  E(x)  = X and  E(t)  = T it  follows  that 

X = T + E(e)  N 

Thus,  x is  a biased  estimate  of  T unless  E(e)=  0,  where  E(e)  = — — . 

That  is,  E(e)  is  the  average  of  the  response  errors,  e^,  for  the  whole 
population. 

For  simple  random  sampling  the  variance  of  x is 

N - 2 

«,  2 E(X  -xr 

„2  N-n  X , „2  i 1 

S-  = — — — where  S„  = — rr-: 

x N n X N-l 

How  does  the  response  error  affect  the  variance  of  X and  of  x?  We  have 
already  written  the  observed  value  for  the  it^1  element  as  being  equal  to 
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its  true  value  plus  a response  error,  that  is,  + e^.  Assuming 

random  sampling,  and  e^  are  random  variables.  We  can  use  Theorem  3.5 
from  Chapter  III  and  write 


2 2 2 
SX  ' ST  + Se  + 2ST,e 


(4.3) 


2 2 2 
where  S is  the  variance  of  X,  S is  the  variance  of  T,  S is  the  response 
Ale 

variance  (that  is,  the  variance  of  e) , and  S is  the  covariance  of  T and 

i ,e 

e.  The  terms  on  the  right-hand  side  of  Equation  (A. 3)  cannot  be  evaluated 
unless  data  on  X^  and  T^  are  available;  however,  the  equation  does  show  how 
the  response  error  influences  the  variance  of  X and  hence  of  x. 

As  a numerical  example,  assume  a population  of  five  elements  and  the 
following  values  for  T and  X: 


23 

13 

17 

25 

7 


26 

12 

23 

25 

9 


3 

-1 

6 

0 

2 


Average 


17 


19 


Students  may  wish  to  verify  the  following  results,  especially  the  variance 
of  e and  the  covariance  of  T and  e: 


SX  = 62’5 


= 54.0 


S = 7.5 
e 


ST ,e  ■ °'5 


As  a verification  of  Equation  (4.3)  we  have 


62.5  = 54.0  + 7.5  + (2)(0.5) 
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From  data  in  a simple  random  sample  one  would  compute  s- 


n - 2 
E(x  -x)Z 

i 1 

n-1 


and  use  — - as  an  estimate  of  the  variance  of  x.  Is  it  clear  that 

N n 

2 2 2 

s is  an  unbiased  estimate  of  S rather  than  of  S and  that  the  impact  of 

X AX 

2 

variation  in  e.  is  included  in  s ? 

1 x 

To  summarize,  response  error  caused  a bias  in  x as  an  estimate  of  T 
that  was  equal  to  X - T.  In  addition,  it  was  a source  of  variation  included 
in  the  standard  error  of  x.  To  evaluate  bias  and  variance  attributable  to 
response  error,  information  on  and  T^  must  be  available. 

Illustration  4.5.  In  this  case,  we  assume  that  the  response  error 
for  a given  element  is  not  constant.  That  is,  if  an  element  were  measured 
on  several  occasions,  the  observed  values  for  the  i^  element  could  vary 
even  though  the  true  value,  T^ , remained  unchanged.  Let  the  error  model  be 


xu  " Ti + eli 


th 


where  X. . is  the  observed  value  of  X for  the  iw“  element  when  the 
ij 

observation  is  taken  on  a particular  occasion,  j, 

T.  is  the  true  value  of  X for  the  it^1  element, 

l 


and 


e^  is  the  response  error  for  the  i^  element  on  a particular 
occasion,  j. 


Assume,  for  any  given  element,  that  the  response  error,  e^ , is  a random 
variable.  We  can  let  e'  = e^  + e , where  e^  is  the  average  value  of  e 


ij 


for  a fixed  i,  that  is , e^  * E(e^|i).  This  divides  the  response  error 
for  the  i^  element  into  two  components:  a constant  component,  e^ , and  a 
variable  component,  e„ . By  definition,  the  expected  value  of  e^  is  zero 


for  any  given  element.  That  is,  E(e^|i) 


0. 
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Substituting  for  , the  model  becomes 


X.  . 
ij 


Ti  + ei  + eij 


(4.4) 


The  model,  Equation  (4.4),  is  now  in  a good  form  for  comparison  with 
the  model  in  Illustration  4.4.  In  Equation  (4.4),  ei , like  e^  in 
Equation  (4.2)  is  constant  for  a given  element.  Thus,  the  two  models 
are  alike  except  for  the  added  term,  e_  , in  Equation  (4.4)  which  allows 
for  the  possibility  that  the  response  error  for  the  it^1  element  might  not 
be  constant. 

Assume  a simple  random  sample  of  n elements  and  one  observation  for 
each  element.  According  to  the  model.  Equation  (4.4),  we  may  now  write 
the  sample  mean  as  follows: 

£t  Ee.  Ee  . 

- i 1 * i 1 . i 13 

n n n 

Summation  with  respect  to  j is  not  needed  as  there  is  only  one  observation 
for  each  element  in  the  sample.  Under  the  conditions  specified  the  expected 
value  of  x may  be  expressed  as  follows : 

E (x)  = T + e 


N 


N 


where 


IT.  Ee. 

.1  . l 

T = — — and  e = — — 
N N 


The  variance  of  x is  complicated  unless  some  further  assumptions  are 
made.  Assume  that  all  covariance  terms  are  zero.  Also,  assume  that  the 
conditional  variance  of  e is  constant  for  all  values  of  i;  that  is,  let 
V(e_|i)  = Sg.  Then,  the  variance  of  x is 

s2  S?  s2 

c2  _ N=n  IT  N-n  _e,_e 
x ~ N n N n n 
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where 


N - 2 N - - 2 

Z(T  -T;  Ke  -e)Z 

-2  _ i _2  _ i 

T N-l  ’ e N-l 


2 

and  S is  the  conditional  variance  of  e , . , that  is,  V(e. . I i) . For  this 
e ij  * ij1 

model  the  variance  of  x does  not  diminish  to  zero  as  n+N.  However,  assuming 

s2 

— 0 

N is  large,  the  variance  of  x,  which  becomes  , is  probably  negligible. 

Definition  4.2.  Mean-Square  Error.  In  terms  of  the  theory  of  expected 

2 

values  the  mean-square  error  of  an  estimate,  x' , is  E(x'-T)  where  T is  the 
target  value,  that  is,  the  value  being  estimated.  From  the  theory  it  is 
easy  to  show  that 

E(x'-T)2  = [E(x")-T]2  + E[x'-E(x')]2 
Thus,  the  mean-square  error,  mse,  can  be  expressed  as  follows: 

2 2 

mse  = B + a ^ 
x 

where  B = E(x^)  - T 

and  °x'  ~ 

Definition  4.3.  Bias . In  Equation  (4.5),  B is  the  bias  in  x'  as 
an  estimate  of  T. 

Definition  4.4.  Precision.  The  precision  of  an  estimate  is  the 
standard  error  of  the  estimate,  namely,  a + in  Equation  (4.7). 

Precision  is  a measure  of  repeatability.  Conceptually,  it  is  a 
measure  of  the  dispersion  of  estimates  that  would  be  generated  by  repetition 
of  the  same  sampling  and  estimation  procedures  many  times  under  the  same 
conditions.  With  reference  to  the  sampling  distribution,  it  is  a measure 
of  the  dispersion  of  the  estimates  from  the  center  of  the  distribution  and 


(4.5) 

(4.6) 

(4.7) 
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does  not  include  any  indication  of  where  the  center  of  the  distribution 
is  in  relation  to  a target. 

In  Illustrations  4.1,  4.2,  and  4.3,  the  target  value  was  implicitly 
assumed  to  be  X;  that  is,  T was  equal  to  X.  Therefore,  B was  zero  and 
the  mean-square  error  of  x'  was  the  same  as  the  variance  of  x'.  In 
Illustrations  4.4  and  4.5  the  picture  was  broadened  somewhat  by  intro- 
ducing response  error  and  examining,  theoretically,  the  impact  of  response 
error  on  E(x')  and  a^.  In  practice  many  factors  have  potential  for 
influencing  the  sampling  distribution  of  x**.  That  is,  the  data  in  a 
sample  are  subject  to  error  that  might  be  attributed  to  several  sources. 

From  sample  data  an  estimate,  x' , is  computed  and  an  estimate  of  the 
variance  of  x'  is  also  computed.  How  does  one  interpret  the  results?  In 
Illustrations  4.4  and  4.5  we  found  that  response  error  could  be  divided 
into  bias  and  variance.  The  error  from  any  source  can,  at  least  concep- 
tually, be  divided  into  bias  and  variance.  An  estimate  from  a sample  is 
subject  to  the  combined  influence  of  bias  and  variance  corresponding  to 
each  of  the  several  sources  of  error.  When  an  estimate  of  the  variance 
of  x'  is  computed  from  sample  data,  the  estimate  is  a combination  of 
variances  that  might  be  identified  with  various  sources.  Likewise  the 
difference  between  E(x')  and  T is  a combination  of  biases  that  might  be 
identified  with  various  sources. 

Figure  4.2  illustrates  the  sampling  distribution  of  x'  for  four 
different  cases:  A,  no  bias  and  low  standard  error;  B,  no  bias  and  large 

standard  error;  C,  large  bias  and  low  standard  error;  and  D,  large  bias 
and  large  standard  error.  The  accuracy  of  an  estimator  is  sometimes  defined 
as  the  square  root  of  the  mean-square  error  of  the  estimator.  According 
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T 


E(x') 

A:  No  bias — low  standard  error 


C:  Large  bias — low  standard  error  D:  Large  bias — large  standard  error 

Figure  4.2 — Examples  of  four  sampling  distributions 


Figure  4.3 — Sampling  distribution — 
Each  small  dot  corresponds  to  an  estimate 
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to  that  definition,  we  could  describe  estimators  having  the  four  sampling 
distributions  in  Figure  4.2  as  follows:  In  case  A the  estimator  is  precise 

and  accurate;  in  B the  estimator  lacks  precision  and  is  therefore  inaccurate; 
in  C the  estimator  is  precise  but  inaccurate  because  of  bias,  and  in  D 'the 
estimator  is  inaccurate  because  of  bias  and  low  precision. 

Unfortunately,  it  is  generally  not  possible  to  determine,  exactly, 
the  magnitude  of  bias  in  an  estimate,  or  of  a particular  component  of  bias. 
However,  evidence  of  the  magnitude  of  bias  is  often  available  from  general 
experience,  from  knowledge  of  how  well  the  survey  processes  were  performed, 
and  from  special  investigations.  The  author  accepts  a point  of  view  that 
the  mean-square  error  is  an  appropriate  concept  of  accuracy  to  follow.  In 
that  context,  the  concern  becomes  a matter  of  the  magnitude  of  the  mse  and 
the  size  of  B relative  to  a^.  That  viewpoint  is  important  because  it  is 
not  possible  to  be  certain  that  B is  zero.  Our  goal  should  be  to  prepare 
survey  specifications  and  to  conduct  survey  operations  so  B is  small  in 
relation  to  a^.  Or,  one  might  say  we  want  the  mse  to  be  minimum  for  a 
given  cost  of  doing  the  survey.  Ways  of  getting  evidence  on  the  magnitude 
of  bias  is  a major  subject  and  is  outside  the  scope  of  this  publication. 

As  indicated  in  the  previous  paragraph,  it  is  important  to  know  some- 
thing about  the  magnitude  of  the  bias,  B,  relative  to  the  standard  error, 
a^.  The  standard  error  is  controlled  primarily  by  the  design  of  a sample 
and  its  size.  For  many  survey  populations,  as  the  size  of  the  sample 
increases,  the  standard  error  becomes  small  relative  to  the  bias.  In  fact, 
the  bias  might  be  larger  than  the  standard  error  even  for  samples  of 
moderate  size,  for  example  a few  hundred  cases,  depending  upon  the  circum- 
stances. The  point  is  that  if  the  mean-square  error  is  to  be  small,  both 
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B and  a ^ must  be  small.  The  approaches  for  reducing  B are  very  different 
from  the  approaches  for  reducing  a^.  The  greater  concern  about  non- 
sampling error  is  bias  rather  than  impact  on  variance.  In  the  design  and 
selection  of  samples  and  in  the  processes  of  doing  the  survey  an  effort  is 
made  to  prevent  biases  that  are  "sampling"  in  origin.  However,  in  survey 
work  one  must  be  constantly  aware  of  potential  biases  and  on  the  alert  to 
minimize  biases  as  well  as  random  error  (that  is,  a^). 

The  above  discussion  puts  a census  in  the  same  light  as  a sample. 
Results  from  both  have  a mean-square  error.  Both  are  surveys  with  refer- 
ence to  use  of  results.  Uncertain  inferences  are  involved  in  the  use  of 
results  from  a census  as  well  as  from  a sample.  The  only  difference  is 
that  in  a census  one  attempts  to  get  a measurement  for  all  N elements, 
but  making  n = N does  not  reduce  the  mse  to  zero.  Indeed,  as  the  sample 
size  increases,  there  is  no  positive  assurance  that  the  mse  will  always 
decrease;  because,  as  the  variance  component  of  the  mse  decreases,  the 
bias  component  might  increase.  This  can  occur  especially  when  the  popu- 
lation is  large  and  items  on  the  questionnaire  are  such  that  simple, 
accurate  answers  are  difficult  to  obtain.  For  a large  sample  or  a census, 
compared  to  a small  sample,  it  might  be  more  difficult  to  control  factors 
that  cause  bias.  Thus,  it  is  possible  for  a census  to  be  less  accurate 
(have  a larger  mse)  than  a sample  wherein  the  sources  of  error  are  more 
adequately  controlled.  Much  depends  upon  the  kind  of  information  being 
collected. 

4.5  BIAS  AND  STANDARD  ERROR 

The  words  "bias,"  "biased,"  and  "unbiased"  have  a wide  variety  of 
meaning  among  various  individuals.  As  a result,  much  confusion  exists. 
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especially  since  the  terms  are  often  used  loosely.  Technically,  it  seems 
logical  to  define  the  bias  in  an  estimate  as  being  equal  to  B in  Equation 
(4.6),  which  is  the  difference  between  the  expected  value  of  an  estimate 
and  the  target  value.  But,  except  for  hypothetical  cases,  numerical  values 
do  not  exist  for  either  E(x^)  or  the  target  T.  Hence,  defining  an  unbiased 
estimate  as  one  where  B = E(x^)  - T = 0 is  of  little,  if  any,  practical 
value  unless  one  is  willing  to  accept  the  target  as  being  equal  to  E(x'). 
From  a sampling  point  of  view  there  are  conditions  that  give  a rational 
basis  for  accepting  E(x')  as  the  target.  However,  regardless  of  how  the 
target  is  defined,  a good  practical  interpretation  of  E(x')  is  needed. 

It  has  become  common  practice  among  survey  statisticians  to  call  an 
estimate  unbiased  when  it  is  based  on  methods  of  sampling  and  estimation 
that  are  "unbiased."  For  example,  in  Illustration  4.4,  x would  be  referred 
to  as  an  unbiased  estimate — unbiased  because  the  method  of  sampling  and 
estimation  was  unbiased.  In  other  words,  since  x was  an  unbiased  estimate 
of  X,  x could  be  interpreted  as  an  unbiased  estimate  of  the  result  that 
would  have  been  obtained  if  all  elements  in  the  population  had  been 
measured. 

In  Illustration  4.5  the  expected  value  of  x is  more  difficult  to 
describe.  Nevertheless,  with  reference  to  the  method  of  sampling  and 
estimation,  x was  "unbiased"  and  could  be  called  an  unbiased  estimate 
even  though  E(x)  is  not  equal  to  T. 

The  point  is  that  a simple  statement  which  says,  "the  estimate  is 
unbiased"  is  incomplete  and  can  be  very  misleading,  especially  if  one  is 
not  familiar  with  the  context  and  concepts  of  bias.  Calling  an  estimate 
unbiased  is  equivalent  to  saying  the  estimate  is  an  unbiased  estimate  of 
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its  expected  value.  Regardless  of  how  "bias'*  is  defined  or  used,  E(xO 
is  the  mean  of  the  sampling  distribution  of  x;  and  this  concept  of  E(x') 
is  very  important  because  E(x')  appears  in  the  standard  error,  o^,  of  x' 
as  well  as  in  B.  See  Equations  (4.6)  and  (4.7). 

As  a simple  concept  or  picture  of  the  error  of  an  estimate  from  a 
survey,  the  writer  likes  the  analogy  between  an  estimate  and  a shot  at 
a target  with  a gun  or  an  arrow.  Think  of  a survey  being  replicated 
many  times  using  the  same  sampling  plan,  but  a different  sample  for  each 
replication.  Each  replication  would  provide  an  estimate  that  corresponds 
to  a shot  at  a target. 

In  Figure  4.3,  each  dot  corresponds  to  an  estimate  from  one  of  the 
replicated  samples.  The  center  of  the  cluster  of  dots  is  labeled  E(x') 
because  it  corresponds  to  the  expected  value  of  an  estimate.  Around  the 
point  E(x')  a circle  is  drawn  which  contains  two-thirds  of  the  points. 

The  radius  of  this  circle  corresponds  to  o^,  the  standard  error  of  the 
estimate.  The  outer  circle  has  a radius  of  two  standard  errors  and  con- 
tains 95  percent  of  the  points.  The  target  is  labeled  T.  The  distance 
between  T and  E(x')  is  bias,  which  in  the  figure  is  greater  than  the 
standard  error. 

In  practice,  we  usually  have  only  one  estimate,  x',  and  an  estimate, 
sx'*  of  the  standard  error  of  x'.  With  reference  to  Figure  4.3,  this 
means  one  point  and  an  estimate  of  the  radius  of  the  circle  around  E(x') 
that  would  contain  two-thirds  of  the  estimates  in  repeated  samplings.  We 
do  not  know  the  value  of  E(x>’)  ; that  is,  we  do  not  know  where  the  center 
of  the  circles  is.  However,  when  we  make  a statement  about  the  standard 
error  of  x^,  we  are  expressing  a degree  of  confidence  about  how  close  a 
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particular  estimate  prepared  from  a survey  is  to  E(x');  that  is,  how 
close  one  of  the  points  in  Figure  4.3  probably  is  to  the  unknown  point 
E(x').  A judgment  as  to  how  far  E(x')  is  from  T is  a matter  of  how  T 
is  defined  and  assessment  of  the  magnitude  of  biases  associated  with 
various  sources  of  error. 

Unfortunately,  it  is  not  easy  to  make  a short,  rigorous,  and  complete 
interpretative  statement  about  the  standard  error  of  x''.  If  the  estimated 
standard  error  of  x'  is  three  percent,  one  could  simply  state  that  fact 
and  not  make  an  interpretation.  It  does  not  help  much  to  say,  for  example, 
that  the  odds  are  about  two  out  of  three  that  the  estimate  is  within  three 
percent  of  its  expected  value,  because  a person  familiar  with  the  concepts 
already  understands  that  and  it  probably  does  not  help  the  person  who  is 
unfamiliar  with  the  concepts.  Suppose  one  states,  "the  standard  error  of 
x'  means  the  odds  are  two  out  of  three  that  the  estimate  is  within  three 
percent  of  the  value  that  would  have  been  obtained  from  a census  taken 
under  identically  the  same  conditions."  That  is  a good  type  of  statement 
to  make  but,  when  one  engages  in  considerations  of  the  finer  points, 
interpretation  of  "a  census  taken  under  identically  the  same  conditions" 
is  needed — especially  since  it  is  not  possible  to  take  a census  under 
identically  the  same  conditions. 

In  summary,  think  of  a survey  as  a fully  defined  system  or  process 
including  all  details  that  could  affect  an  estimate,  including:  the  method 

of  sampling;  the  method  of  estimation;  the  wording  of  questions;  the  order 
of  the  questions  on  the  questionnaire;  interviewing  procedures;  selection, 
training,  and  supervision  of  interviewers;  and  editing  and  processing  of 
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data.  Conceptually,  the  sampling  is  then  replicated  many  times,  holding 
all  specifications  and  conditions  constant.  This  would  generate  a sam- 
pling distribution  as  illustrated  in  Figures  4.2  or  4.3.  We  need  to 
recognize  that  a change  in  any  of  the  survey  specifications  or  conditions, 
regardless  of  how  trivial  the  change  might  seem,  has  a potential  for 
changing  the  sampling  distribution,  especially  the  expected  value  of  x' . 
Changes  in  survey  plans,  even  though  the  definition  of  the  parameters 
being  estimated  remains  unchanged,  often  result  in  discrepancies  that 
are  larger  than  the  random  error  that  can  be  attributed  to  sampling. 

The  points  discussed  in  the  latter  part  of  this  chapter  were  included 
to  emphasize  that  much  more  than  a well  designed  sample  is  required  to 
assure  accurate  results.  Good  survey  planning  and  management  calls  for 
evaluation  of  errors  from  all  sources  and  for  trying  to  balance  the  effort 
to  control  error  from  various  sources  so  the  mean-square  error  will  be 
within  acceptable  limits  as  economically  as  possible. 


